[
https://issues.apache.org/jira/browse/IMPALA-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang resolved IMPALA-886.
-----------------------------------
Fix Version/s: Impala 4.2.0
Resolution: Fixed
> Always display HBase cols in same order as CREATE TABLE statement
> -----------------------------------------------------------------
>
> Key: IMPALA-886
> URL: https://issues.apache.org/jira/browse/IMPALA-886
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog
> Affects Versions: Impala 1.3
> Reporter: John Russell
> Assignee: Csaba Ringhofer
> Priority: Minor
> Labels: catalog-server, hbase, usability
> Fix For: Impala 4.2.0
>
>
> I noticed a discrepancy with Hive, in how Impala handles column order for
> HBase tables.
> I think it would be preferable to use the same behavior as Hive, otherwise
> life becomes
> more complicated for anyone doing INSERT or SELECT * with an HBase table
> through Impala.
> (And I have to add caveats and usage notes in the docs.)
> Repro:
> In HBase shell, create a table with a single column family. I think most
> Impala tests use 1 column family per column, where you won't notice this
> behavior.
> hbase(main):008:0> create 'sample_data_fast','cols'
> 0 row(s) in 71.8750 seconds
> In Hive shell, create a mapping table. Notice how DESCRIBE repeats back the
> columns in the same order as in CREATE TABLE.
> hive> create external table sample_data_fast (id string, val int, zfill
> string, name string, assertion boolean)
> > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> > WITH SERDEPROPERTIES (
> > "hbase.columns.mapping" =
> > ":key,cols:val,cols:zfill,cols:name,cols:assertion")
> > TBLPROPERTIES("hbase.table.name" = "sample_data_fast")
> > ;
> OK
> Time taken: 1.7 seconds
> hive> desc sample_data_fast;
> OK
> id string from deserializer
> val int from deserializer
> zfill string from deserializer
> name string from deserializer
> assertion boolean from deserializer
> Time taken: 0.302 seconds
> Now try the same DESCRIBE in impala-shell. The key column (id) is listed
> first. Then all the other columns, part of the same column family, are listed
> in alphabetical order rather than the order from CREATE TABLE:
> [localhost:21000] > desc sample_data_fast;
> Query: describe sample_data_fast
> +-----------+---------+---------+
> | name | type | comment |
> +-----------+---------+---------+
> | id | string | |
> | assertion | boolean | |
> | name | string | |
> | val | int | |
> | zfill | string | |
> +-----------+---------+---------+
> Returned 5 row(s) in 0.02s
> Thus if you already had Hive code that was doing SELECT * from an HBase table
> like this, you would get a different result set (different column order) in
> Impala.
> If you tried to copy from an HDFS table via 'INSERT INTO hbase_table SELECT *
> FROM hdfs_table', you would get an error because the columns don't match. If
> you made a separate column family for each column, the discrepancy is masked
> because you need more than one column per column family to experience the
> alphabetical ordering.
> Since Hive is preserving the column order, the relevant info must be there in
> the metastore.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]