Yes, we currently use traits that have methods. Something like “supports reading missing columns” doesn’t need to deliver methods. The other example is where we don’t have an object to test for a trait ( scan.isInstanceOf[SupportsBatch]) until we have a Scan with pushdown done. That could be expensive so we can use a capability to fail faster.
On Thu, Nov 8, 2018 at 1:54 PM Reynold Xin <r...@databricks.com> wrote: > This is currently accomplished by having traits that data sources can > extend, as well as runtime exceptions right? It's hard to argue one way vs > another without knowing how things will evolve (e.g. how many different > capabilities there will be). > > > On Thu, Nov 8, 2018 at 12:50 PM Ryan Blue <rb...@netflix.com.invalid> > wrote: > >> Hi everyone, >> >> I’d like to propose an addition to DataSourceV2 tables, a capability API. >> This API would allow Spark to query a table to determine whether it >> supports a capability or not: >> >> val table = catalog.load(identifier) >> val supportsContinuous = table.isSupported("continuous-streaming") >> >> There are a couple of use cases for this. First, we want to be able to >> fail fast when a user tries to stream a table that doesn’t support it. The >> design of our read implementation doesn’t necessarily support this. If we >> want to share the same “scan” across streaming and batch, then we need to >> “branch” in the API after that point, but that is at odds with failing >> fast. We could use capabilities to fail fast and not worry about that >> concern in the read design. >> >> I also want to use capabilities to change the behavior of some validation >> rules. The rule that validates appends, for example, doesn’t allow a write >> that is missing an optional column. That’s because the current v1 sources >> don’t support reading when columns are missing. But Iceberg does support >> reading a missing column as nulls, so that users can add a column to a >> table without breaking a scheduled job that populates the table. To fix >> this problem, I would use a table capability, like >> read-missing-columns-as-null. >> >> Any comments on this approach? >> >> rb >> -- >> Ryan Blue >> Software Engineer >> Netflix >> > -- Ryan Blue Software Engineer Netflix