[jira] [Comment Edited] (DRILL-13) Storage Engine: Define Java Interface

David Alves (JIRA) Tue, 12 Mar 2013 20:58:15 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13600784#comment-13600784
 ]


David Alves edited comment on DRILL-13 at 3/13/13 3:57 AM:
-----------------------------------------------------------

My thoughts/doubts on this:

Meta APIs:
Agree with Statistics, Metadata but think Serialization and Deserialization 
should be encapsulated within the storage engine itself.

Engine Api:

- Takes over subtree of the query plan (probably of the physical plan when 
logical plan operators have been divided into partitions, for instance) based 
on the engine capability.
   ex. HBase can take filters, projections and partial aggregations, and push 
them down below HBase Scan operator.
   ex: if we add a RDBMS engine later we can rebuild portion of the plan that 
can be pushed down in SQL form and run as is.

- Provides a Reader/Scanner interface for scanning the results of executing the 
subtree
- Provides a Writer interface for outputting intermediate or final results.

Questions/Doubts:

- Storage engines have local interface accessible within the Drill daemon that 
is colocated with the underlying system's daemons, correct?
   - specifically I mean that for NoSQL stores like Cassandra or HBase there 
will be a local daemon in each node that does inter-process communication with 
the underlying store and provides information on the local partitions so that 
the query planner can take that into account.
- We will need a meta store for ad-hoc schema matching/schema caching correct?
  - While Cassandra has a schema that is easy to use and read when queried with 
CQL3 we could probably use storing the schema of certain HBase tables so that 
the values in it can be returned in some form other than byte[]s, the user 
would be responsible for maintaining this.






                
      was (Author: dr-alves):
    My thoughts/doubts on this:

Meta APIs:
Agree with Statistics, Metadata but think Serialization and Deserialization 
should be encapsulated within the storage engine itself.

Engine Api:

- Takes over subtree of the query plan (probably of the physical plan when 
logical plan operators have been divided into partitions, for instance) based 
on the engine capability.
   ex. HBase can take filters, projections and partial aggregations, and push 
them down below HBase Scan operator.
   ex: if we add a RDBMS engine later we can rebuild portion of the plan that 
can be pushed down in SQL form and run as is.

- Provides a Reader/Scanner interface for scanning the results of executing the 
subtree
- Provides a Writer interface for outputting intermediate or final results.

Questions/Doubts:

- Storage engine daemons are supposed to be colocated with the underlying 
system's daemons, correct?
   - specifically I mean that for NoSQL stores like Cassandra or HBase there 
will be a local daemon in each node that does inter-process communication with 
the underlying store and provides information on the local partitions so that 
the query planner can take that into account.
- We will need a meta store for ad-hoc schema matching/schema caching correct?
  - While Cassandra has a schema that is easy to use and read when queried with 
CQL3 we could probably use storing the schema of certain HBase tables so that 
the values in it can be returned in some form other than byte[]s, the user 
would be responsible for maintaining this.






                  
> Storage Engine: Define Java Interface
> -------------------------------------
>
>                 Key: DRILL-13
>                 URL: https://issues.apache.org/jira/browse/DRILL-13
>             Project: Apache Drill
>          Issue Type: Task
>            Reporter: Jacques Nadeau
>            Assignee: Jacques Nadeau
>
> We're going to need to define a storage engine API.  At a minimum, we'll need 
> to generate a Java one.  We will probably need to also create a CPP one.  
> This task is for the former.  Things that are likely to be included in a the 
> Java interface are: reader (scanner), writer, capabilities interface, schema 
> interface, statistics interface, data layout and ordering

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (DRILL-13) Storage Engine: Define Java Interface

Reply via email to