[ 
https://issues.apache.org/jira/browse/DRILL-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Alves updated DRILL-13:
-----------------------------

    Comment: was deleted

(was: My interest in BF's is not so much in advertising that the underlying 
engine supports them for generic purposes (even though that might be 
interesting in some obscure optimization choices), my interest pertains to 
using them in large scale joins.

My assumption is that large scale joins will be composed of two parts, one 
local part below the SE layer that handles node local data, and one part above 
the SE layer that coordinates the join across cluster nodes.

Now of course in an ideal world we could have portable-format BF's that could 
be used on semi-joins across datasource formats, but that is much harder that 
what I'm proposing.

Im proposing to start by having a portable BF's definition but the BF itself 
would be opaque and could only be used for intra-datasource joins (across hbase 
nodes or across cassandra nodes but not between hbase and cassandra).

Now I agree with your definition of what the real use cases are, but the join 
coordination layer would still sit above the SE, which means that we could use 
the same code for both hbase or cassandra at this layer since we dont care 
about the BF format, but it would have to access the BF definition and the BF 
itself in opaque form.

I know this is certainly not a design priority, but I do think BF definition 
info would sit nicely with the partitioning info and would not require much 
beyond that.

In any case I'll try and eat my dog food, i.e. output some code that 
illustrates what I'm saying and maybe you can take a look and tell me what you 
think.

Now all of this is a moot point if the consensus is that everything should 
happen below the SE layer (i.e. in a multi phase join both phases happen under 
the SE layer that just provides a reader abstraction to the joined data). 
In this case I do think we'd be loosing a good opportunity for reuse but worst 
of all it would require a completely different implementation for 
inter-datasource joins (e.g. joining data from Hbase and an RDBMS).)
    
> Storage Engine: Define Java Interface
> -------------------------------------
>
>                 Key: DRILL-13
>                 URL: https://issues.apache.org/jira/browse/DRILL-13
>             Project: Apache Drill
>          Issue Type: Task
>            Reporter: Jacques Nadeau
>            Assignee: Jacques Nadeau
>
> We're going to need to define a storage engine API.  At a minimum, we'll need 
> to generate a Java one.  We will probably need to also create a CPP one.  
> This task is for the former.  Things that are likely to be included in a the 
> Java interface are: reader (scanner), writer, capabilities interface, schema 
> interface, statistics interface, data layout and ordering

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to