[
https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14055387#comment-14055387
]
Jeffrey Zhong commented on PHOENIX-1056:
----------------------------------------
Oh, I'm late to see the JIRA. I had a different patch to load index & table
data in one go by submitting multiple MR jobs to load data concurrently for
CsvBulkLoadTool.
[~jaywong] approach is using one MR to load data & index data in one single map
reduce job. I checked the patch and the underlying idea is very good. But it
has one issue is that the partitioning is on primary table. Therefore, the
index table hfiles aren't align with its own partitioning and when loading
those generated index hfiles will incur extra writes during loading.
Let me firstly create a separate JIRA to improve CsvBulkLoadTool to build
indexes during loading time and later we can decide if to migrate
CsvBulkLoadTool to use current JIRA's custom mapper, reducer and
MultiHFileOutputFormat.
> A ImportTsv tool for phoenix to build table data and all index data.
> --------------------------------------------------------------------
>
> Key: PHOENIX-1056
> URL: https://issues.apache.org/jira/browse/PHOENIX-1056
> Project: Phoenix
> Issue Type: Task
> Affects Versions: 3.0.0
> Reporter: jay wong
> Fix For: 3.1
>
> Attachments: PHOENIX-1056.patch
>
>
> I have just build a tool for build table data and index table data just like
> ImportTsv job.
> http://hbase.apache.org/book/ops_mgt.html#importtsv
> when ImportTsv work it write HFile in a CF name path.
> for example A table has two cf, A and B.
> the output is
> ...../outputpath/A
> ...../outputpath/B
> In my job. we has a table. TableOne. and two Index IdxOne, IdxTwo.
> the output will be
> ...../outputpath/TableOne/A
> ...../outputpath/TableOne/B
> ...../outputpath/IdxOne
> ...../outputpath/IdxTwo.
> If anyone need it .I will build a clean tool.
--
This message was sent by Atlassian JIRA
(v6.2#6252)