[
https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905918#comment-15905918
]
Eugene Koifman edited comment on HIVE-16177 at 3/11/17 12:37 AM:
-----------------------------------------------------------------
We currently only allow converting a bucketed ORC table to Acid.
Some possibilities:
Check how splits are done for "isOriginal" files. If
RecordReader.getRowNumber() is not smart enough to produce an ordinal from the
beginning of the file, then we must be creating 1 split per bucket.
If getRowNumber() is smart enough to produce a number from the beginning of the
file, we may be splitting each file.
Either way, OriginalReaderPair could look at which copy_N it has and look for
all copy_M files with M<N and get number of rows in each. Sum up these row
counts and use that as a starting point from which to number rows in copy_N.
Alternatively, we can make each split include all files in a bucket in order
and just keep numbering the rows.
Having a smart getRowNumber() would be better since it allows splitting the
table into many pieces. Otherwise the read parallelism is limited to the
number of buckets. So for a large pre-acid table this may seem like a big drop
in performance once it's converted to acid but before 1st major compaction
Another possibility is to assign a different transaction ID to each copy_N file
- we'd have to use a < 0 number - maybe the simplest fix if it works
was (Author: ekoifman):
We currently only allow converting a bucketed ORC table to Acid.
Some possibilities:
Check how splits are done for "isOriginal" files. If
RecordReader.getRowNumber() is not smart enough to produce an ordinal from the
beginning of the file, then we must be creating 1 split per bucket.
If getRowNumber() is smart enough to produce a number from the beginning of the
file, we may be splitting each file.
Either way, OriginalReaderPair could look at which copy_N it has and look for
all copy_M files with M<N and get number of rows in each. Sum up these row
counts and use that as a starting point from which to number rows in copy_N.
Alternatively, we can make each split include all files in a bucket in order
and just keep numbering the rows.
Having a smart getRowNumber() would be better since it allows splitting the
table into many pieces. Otherwise the read parallelism is limited to the
number of buckets. So for a large pre-acid table this may seem like a big drop
in performance once it's converted to acid but before 1st major compaction
> non Acid to acid conversion doesn't handle _copy_N files
> --------------------------------------------------------
>
> Key: HIVE-16177
> URL: https://issues.apache.org/jira/browse/HIVE-16177
> Project: Hive
> Issue Type: Bug
> Components: Transactions
> Reporter: Eugene Koifman
> Assignee: Eugene Koifman
> Priority: Critical
> Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch
>
>
> {noformat}
> create table T(a int, b int) clustered by (a) into 2 buckets stored as orc
> TBLPROPERTIES('transactional'='false')
> insert into T(a,b) values(1,2)
> insert into T(a,b) values(1,3)
> alter table T SET TBLPROPERTIES ('transactional'='true')
> {noformat}
> //we should now have bucket files 000001_0 and 000001_0_copy_1
> but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can
> be copy_N files and numbers rows in each bucket from 0 thus generating
> duplicate IDs
> {noformat}
> select ROW__ID, INPUT__FILE__NAME, a, b from T
> {noformat}
> produces
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/000001_0,1,2
> {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/000001_0_copy_1,1,3
> {noformat}
> [~owen.omalley], do you have any thoughts on a good way to handle this?
> attached patch has a few changes to make Acid even recognize copy_N but this
> is just a pre-requisite. The new UT demonstrates the issue.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)