[ https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068721#comment-14068721 ]
Gabriel Reid commented on PHOENIX-1056: --------------------------------------- Looks interesting — I guess the main difference in terms of added functionality after what PHOENIX-1069 added is that this runs things in one job. Is that pretty accurate? I think the idea of running things in one job (or at least having a consistent starting point) is worth putting effort into. I’m curious what the performance boost would be in terms of going to one job — that’s actually less of a concern for me, but the consistent starting point is pretty important. I think it might be possible to get the consistent starting point for all jobs (while still having multiple jobs) as well, so this might also be something to consider. Concerns I have with PHOENIX-1056 as it is now (although of course I understand this is a work in progress) * It has a lot of code in terms of handling the Phoenix encoding itself * Arg parsing/usability (i.e. it looks like parameters need to be given in the form of -D parameters or config params) My preference would be to tackle the “consistent starting point” issue in the current CSV bulk loader. I think that what's being done in this code can be useful in finding a way to do that. > A ImportTsv tool for phoenix to build table data and all index data. > -------------------------------------------------------------------- > > Key: PHOENIX-1056 > URL: https://issues.apache.org/jira/browse/PHOENIX-1056 > Project: Phoenix > Issue Type: Task > Affects Versions: 3.0.0 > Reporter: jay wong > Fix For: 3.1 > > Attachments: PHOENIX-1056.patch > > > I have just build a tool for build table data and index table data just like > ImportTsv job. > http://hbase.apache.org/book/ops_mgt.html#importtsv > when ImportTsv work it write HFile in a CF name path. > for example A table has two cf, A and B. > the output is > ...../outputpath/A > ...../outputpath/B > In my job. we has a table. TableOne. and two Index IdxOne, IdxTwo. > the output will be > ...../outputpath/TableOne/A > ...../outputpath/TableOne/B > ...../outputpath/IdxOne > ...../outputpath/IdxTwo. > If anyone need it .I will build a clean tool. -- This message was sent by Atlassian JIRA (v6.2#6252)