[ 
https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068721#comment-14068721
 ] 

Gabriel Reid commented on PHOENIX-1056:
---------------------------------------

Looks interesting — I guess the main difference in terms of added functionality 
after what PHOENIX-1069 added is that this runs things in one job. Is that 
pretty accurate?

I think the idea of running things in one job (or at least having a consistent 
starting point) is worth putting effort into. I’m curious what the performance 
boost would be in terms of going to one job — that’s actually less of a concern 
for me, but the consistent starting point is pretty important. I think it might 
be possible to get the consistent starting point for all jobs (while still 
having multiple jobs) as well, so this might also be something to consider.

Concerns I have with PHOENIX-1056 as it is now (although of course I understand 
this is a work in progress)
* It has a lot of code in terms of handling the Phoenix encoding itself
* Arg parsing/usability (i.e. it looks like parameters need to be given in the 
form of -D parameters or config params)

My preference would be to tackle the “consistent starting point” issue in the 
current CSV bulk loader. I think that what's being done in this code can be 
useful in finding a way to do that.

> A ImportTsv tool for phoenix to build table data and all index data.
> --------------------------------------------------------------------
>
>                 Key: PHOENIX-1056
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1056
>             Project: Phoenix
>          Issue Type: Task
>    Affects Versions: 3.0.0
>            Reporter: jay wong
>             Fix For: 3.1
>
>         Attachments: PHOENIX-1056.patch
>
>
> I have just build a tool for build table data and index table data just like 
> ImportTsv job.
> http://hbase.apache.org/book/ops_mgt.html#importtsv
> when ImportTsv work it write HFile in a CF name path.
> for example A table has two cf, A and B.
> the output is 
> ...../outputpath/A
> ...../outputpath/B
> In my job. we has a table.  TableOne. and two Index IdxOne, IdxTwo.
> the output will be
> ...../outputpath/TableOne/A
> ...../outputpath/TableOne/B
> ...../outputpath/IdxOne
> ...../outputpath/IdxTwo.
> If anyone need it .I will build a clean tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to