[
https://issues.apache.org/jira/browse/DRILL-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Alves updated DRILL-13:
-----------------------------
Comment: was deleted
(was: My interest in BF's is not so much in advertising that the underlying
engine supports them for generic purposes (even though that might be
interesting in some obscure optimization choices), my interest pertains to
using them in large scale joins.
My assumption is that large scale joins will be composed of two parts, one
local part below the SE layer that handles node local data, and one part above
the SE layer that coordinates the join across cluster nodes.
Now of course in an ideal world we could have portable-format BF's that could
be used on semi-joins across datasource formats, but that is much harder that
what I'm proposing.
Im proposing to start by having a portable BF's definition but the BF itself
would be opaque and could only be used for intra-datasource joins (across hbase
nodes or across cassandra nodes but not between hbase and cassandra).
Now I agree with your definition of what the real use cases are, but the join
coordination layer would still sit above the SE, which means that we could use
the same code for both hbase or cassandra at this layer since we dont care
about the BF format, but it would have to access the BF definition and the BF
itself in opaque form.
I know this is certainly not a design priority, but I do think BF definition
info would sit nicely with the partitioning info and would not require much
beyond that.
In any case I'll try and eat my dog food, i.e. output some code that
illustrates what I'm saying and maybe you can take a look and tell me what you
think.
Now all of this is a moot point if the consensus is that everything should
happen below the SE layer (i.e. in a multi phase join both phases happen under
the SE layer that just provides a reader abstraction to the joined data).
In this case I do think we'd be loosing a good opportunity for reuse but worst
of all it would require a completely different implementation for
inter-datasource joins (e.g. joining data from Hbase and an RDBMS).)
> Storage Engine: Define Java Interface
> -------------------------------------
>
> Key: DRILL-13
> URL: https://issues.apache.org/jira/browse/DRILL-13
> Project: Apache Drill
> Issue Type: Task
> Reporter: Jacques Nadeau
> Assignee: Jacques Nadeau
>
> We're going to need to define a storage engine API. At a minimum, we'll need
> to generate a Java one. We will probably need to also create a CPP one.
> This task is for the former. Things that are likely to be included in a the
> Java interface are: reader (scanner), writer, capabilities interface, schema
> interface, statistics interface, data layout and ordering
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira