[
https://issues.apache.org/jira/browse/DRILL-15?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lisen Mu updated DRILL-15:
--------------------------
Attachment: DRILL-15.metastore.patch.txt
cut related part form our code base. this patch is pretty far from
ready-to-merge state, but I guess it won't be merged anyway:) just show the
idea and the problem.
We are using hive metastore to represent data schema information of HTables.
Metastore has 3 kind of information:
* WHAT the logical fields look like. From user's perspective, a record may have
different fields like integer, date, boolean or text etc. This also indicates
how drill would process fields in memory.
* WHERE the logical fields is stored in HTable. there are many places in HTable
in which information can be stored: rowkey; qualifier name of a particular CF;
value under a particular CF:qualifier; or even version number of a particular
cell. The value of a logical field can be stored as any of above. Further,
rowkey may contain multiple logical fields. This highly depends on how user
design their storage schema.
* HOW logical fields is stored. HBase basically provides a storage for byte[].
So HTable scanner needs to know how the fields like integer, date, boolean are
serialized as byte[]. For example, 255 would be serialized to \xFF as
BINARY:1byte, or [FF 00 00 00] as BINARY:4byte, or "255" as TEXT(with variable
length), or "00000255" as TEXT(with fixed length:8). Another example would be
logical DATE to (first integer 1375483564 then) [AC 36 FC 51] as BINARY:4byte,
or "20130803" as TEXT(with fixed length:8).
The meta definition is in com.xingcloud.meta.HBaseFieldInfo.java
These information will be used in HBase scanner to generate most effective scan
(mapping logical Filter to HBase's filter class, and deciding startKey and
endKey to scan least data), and in conversion from LogicalPlan to PhysicalPlan,
to generate the correct ReadEntry for HBase.
I do understand that strong schema is not drill's primary concern, however I
think other approaches to HBase scanner also have to solve the problems above
to work correctly.
> Build HBase storage engine implementation
> -----------------------------------------
>
> Key: DRILL-15
> URL: https://issues.apache.org/jira/browse/DRILL-15
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Ted Yu
> Assignee: David Alves
> Attachments: DRILL-15-02.patch, DRILL-15-03.patch,
> DRILL-15.metastore.patch.txt, DRILL-15.patch
>
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira