Yes, we currently use traits that have methods. Something like “supports
reading missing columns” doesn’t need to deliver methods. The other example
is where we don’t have an object to test for a trait (
scan.isInstanceOf[SupportsBatch]) until we have a Scan with pushdown done.
That could be expensive so we can use a capability to fail faster.

On Thu, Nov 8, 2018 at 1:54 PM Reynold Xin <r...@databricks.com> wrote:

> This is currently accomplished by having traits that data sources can
> extend, as well as runtime exceptions right? It's hard to argue one way vs
> another without knowing how things will evolve (e.g. how many different
> capabilities there will be).
>
>
> On Thu, Nov 8, 2018 at 12:50 PM Ryan Blue <rb...@netflix.com.invalid>
> wrote:
>
>> Hi everyone,
>>
>> I’d like to propose an addition to DataSourceV2 tables, a capability API.
>> This API would allow Spark to query a table to determine whether it
>> supports a capability or not:
>>
>> val table = catalog.load(identifier)
>> val supportsContinuous = table.isSupported("continuous-streaming")
>>
>> There are a couple of use cases for this. First, we want to be able to
>> fail fast when a user tries to stream a table that doesn’t support it. The
>> design of our read implementation doesn’t necessarily support this. If we
>> want to share the same “scan” across streaming and batch, then we need to
>> “branch” in the API after that point, but that is at odds with failing
>> fast. We could use capabilities to fail fast and not worry about that
>> concern in the read design.
>>
>> I also want to use capabilities to change the behavior of some validation
>> rules. The rule that validates appends, for example, doesn’t allow a write
>> that is missing an optional column. That’s because the current v1 sources
>> don’t support reading when columns are missing. But Iceberg does support
>> reading a missing column as nulls, so that users can add a column to a
>> table without breaking a scheduled job that populates the table. To fix
>> this problem, I would use a table capability, like
>> read-missing-columns-as-null.
>>
>> Any comments on this approach?
>>
>> rb
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to