[ 
https://issues.apache.org/jira/browse/DRILL-15?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisen Mu updated DRILL-15:
--------------------------

    Attachment: DRILL-15.metastore.patch.txt

cut related part form our code base. this patch is pretty far from 
ready-to-merge state, but I guess it won't be merged anyway:) just show the 
idea and the problem.

We are using hive metastore to represent data schema information of HTables. 
Metastore has 3 kind of information:

* WHAT the logical fields look like. From user's perspective, a record may have 
different fields like integer, date, boolean or text etc. This also indicates 
how drill would process fields in memory.

* WHERE the logical fields is stored in HTable. there are many places in HTable 
in which information can be stored: rowkey; qualifier name of a particular CF; 
value under a particular CF:qualifier; or even version number of a particular 
cell. The value of a logical field can be stored as any of above. Further, 
rowkey may contain multiple logical fields. This highly depends on how user 
design their storage schema.

* HOW logical fields is stored. HBase basically provides a storage for byte[]. 
So HTable scanner needs to know how the fields like integer, date, boolean are 
serialized as byte[]. For example, 255 would be serialized to \xFF as 
BINARY:1byte, or [FF 00 00 00] as BINARY:4byte, or "255" as TEXT(with variable 
length), or "00000255" as TEXT(with fixed length:8). Another example would be 
logical DATE to (first integer 1375483564 then) [AC 36 FC 51] as BINARY:4byte, 
or "20130803" as TEXT(with fixed length:8).

The meta definition is in com.xingcloud.meta.HBaseFieldInfo.java

These information will be used in HBase scanner to generate most effective scan 
(mapping logical Filter to HBase's filter class, and deciding startKey and 
endKey to scan least data), and in conversion from LogicalPlan to PhysicalPlan, 
to generate the correct ReadEntry for HBase.


I do understand that strong schema is not drill's primary concern, however I 
think other approaches to HBase scanner also have to solve the problems above 
to work correctly. 


                
> Build HBase storage engine implementation
> -----------------------------------------
>
>                 Key: DRILL-15
>                 URL: https://issues.apache.org/jira/browse/DRILL-15
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: David Alves
>         Attachments: DRILL-15-02.patch, DRILL-15-03.patch, 
> DRILL-15.metastore.patch.txt, DRILL-15.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to