[
https://issues.apache.org/jira/browse/PHOENIX-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194981#comment-15194981
]
Sergey Soldatov commented on PHOENIX-2723:
------------------------------------------
well, the logic is quite simple. If there are several input files and one table
name - all those files will be loaded to this table. Otherwise the number of
tables need to be equal number of inputs. The advantage is to avoid writing
iterating scripts, reduce time of job creation and scheduling and theoretically
make a better load for the cluster.
> Make BulkLoad able to load several tables at once
> -------------------------------------------------
>
> Key: PHOENIX-2723
> URL: https://issues.apache.org/jira/browse/PHOENIX-2723
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Sergey Soldatov
> Assignee: Sergey Soldatov
>
> It comes that usually bulk load is required for more than one table and
> usually it's done by running jobs one by one. The idea is to provide lists of
> tables and corresponding input sources to the MR BulkLoad job. Syntax can be
> something like :
> yarn ... CsvBulkLoadTool -t table1,table2,table3 --input input1,input2,input3
> Having map tableName => input during map phase we can determine to which
> table the current split belongs to and produce necessary tableRowKeyPair.
> Any thoughts, suggestions?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)