[
https://issues.apache.org/jira/browse/HBASE-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955877#comment-13955877
]
Nick Dimiduk commented on HBASE-10091:
--------------------------------------
I haven't worked through a prototype yet, so I don't know exactly. The DSL we
have for exposing filters is parsed once, in Java (using
[ParseFilter|https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ParseFilter.html]),
by the shell or Thrift service (I guess REST service doesn't support this
yet). The user would provide the type mapping as a configuration string and let
whatever is interacting with the HTable handle sending provided data literals
to the correct DataType instances.
One example consumer is the Hive metastore. The table is defined in metastore
that has a column mapping, similar to today, mapping the metastore table column
to an HBase table column. In addition to the column mapping, a type
specification is also provided. This would be an Expression in the DSL we're
discussing. The StorageHandler would be responsible for honoring this
additional component in the mapping. How exactly we ensure the metastore type
can be converted to/from the HBase {{DataType}} is still up for question. I
hope to learn from Phoenix on this, hence I deferred that work out to
HBASE-8863.
More concretely, I imagine this DSL is relatively simple. A complete type
definition might be as simple as {{package.class\[/ORDER\]}}. We'll need to add
any necessary API to {{DataType}} to support constructing from the parser.
There may also be some built-in named definitions, "raw" or "ordered-bytes",
where we ship an existing known mapping between Java type and HBase DataType
implementation. This would be a convenience for consumers of HTable; I don't
know how this would play into a metastore implementation.
The only place where potential overlap with Avro/Protobuf comes in is with
[Struct|http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/types/Struct.html].
I'm not convinced this is very complicated either; just a sequence of types
with syntax for specifying an optional element. There's no concept of "schema
versioning" in {{Struct}}; there's no room for it in a place where encoded
ordering is the primary concern.
> Exposing HBase DataTypes to non-Java interfaces
> -----------------------------------------------
>
> Key: HBASE-10091
> URL: https://issues.apache.org/jira/browse/HBASE-10091
> Project: HBase
> Issue Type: Sub-task
> Components: Client
> Reporter: Nick Dimiduk
>
> Access to the DataType implementations introduced in HBASE-8693 is currently
> limited to consumers of the Java API. It is not easy to specify a data type
> in non-Java environments, such as the HBase shell, REST or Thrift Gateways,
> command-line arguments to our utility MapReduce jobs, or in integration
> points such as a (hypothetical extension to) Hive's HBaseStorageHandler. See
> examples where this limitation impedes in HBASE-8593 and HBASE-10071.
> I propose the implementation of a type definition DSL, similar to the
> language defined for Filters in HBASE-4176. By implementing this in core
> HBase, it can be reused in all of the situations described previously. The
> parser for this DSL must support arbitrary type extensions, just as the
> Filter parser allows for new Filter types to be registered at runtime.
--
This message was sent by Atlassian JIRA
(v6.2#6252)