[
https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068721#comment-14068721
]
Gabriel Reid commented on PHOENIX-1056:
---------------------------------------
Looks interesting — I guess the main difference in terms of added functionality
after what PHOENIX-1069 added is that this runs things in one job. Is that
pretty accurate?
I think the idea of running things in one job (or at least having a consistent
starting point) is worth putting effort into. I’m curious what the performance
boost would be in terms of going to one job — that’s actually less of a concern
for me, but the consistent starting point is pretty important. I think it might
be possible to get the consistent starting point for all jobs (while still
having multiple jobs) as well, so this might also be something to consider.
Concerns I have with PHOENIX-1056 as it is now (although of course I understand
this is a work in progress)
* It has a lot of code in terms of handling the Phoenix encoding itself
* Arg parsing/usability (i.e. it looks like parameters need to be given in the
form of -D parameters or config params)
My preference would be to tackle the “consistent starting point” issue in the
current CSV bulk loader. I think that what's being done in this code can be
useful in finding a way to do that.
> A ImportTsv tool for phoenix to build table data and all index data.
> --------------------------------------------------------------------
>
> Key: PHOENIX-1056
> URL: https://issues.apache.org/jira/browse/PHOENIX-1056
> Project: Phoenix
> Issue Type: Task
> Affects Versions: 3.0.0
> Reporter: jay wong
> Fix For: 3.1
>
> Attachments: PHOENIX-1056.patch
>
>
> I have just build a tool for build table data and index table data just like
> ImportTsv job.
> http://hbase.apache.org/book/ops_mgt.html#importtsv
> when ImportTsv work it write HFile in a CF name path.
> for example A table has two cf, A and B.
> the output is
> ...../outputpath/A
> ...../outputpath/B
> In my job. we has a table. TableOne. and two Index IdxOne, IdxTwo.
> the output will be
> ...../outputpath/TableOne/A
> ...../outputpath/TableOne/B
> ...../outputpath/IdxOne
> ...../outputpath/IdxTwo.
> If anyone need it .I will build a clean tool.
--
This message was sent by Atlassian JIRA
(v6.2#6252)