[ https://issues.apache.org/jira/browse/PHOENIX-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195060#comment-15195060 ]
Gabriel Reid commented on PHOENIX-2723: --------------------------------------- {quote} well, the logic is quite simple. If there are several input files and one table name - all those files will be loaded to this table. Otherwise the number of tables need to be equal number of inputs. {quote} This sounds like the semantics of one input parameter is then changed by the contents of other input parameters, which I'm personally not in favor of. I think that sticking with a single invocation is for loading a single table is the best way to stay in line with the [Principle of least astonishment|https://en.wikipedia.org/wiki/Principle_of_least_astonishment] (mostly because it is in line with how most other tools work), and the advantages of not having to write shell scripts around it and reduced start-up time don't feel like a bit enough win to compromise on simplicity here. That's just my opinion of course. > Make BulkLoad able to load several tables at once > ------------------------------------------------- > > Key: PHOENIX-2723 > URL: https://issues.apache.org/jira/browse/PHOENIX-2723 > Project: Phoenix > Issue Type: Improvement > Reporter: Sergey Soldatov > Assignee: Sergey Soldatov > Attachments: PHOENIX-2723-1.patch > > > It comes that usually bulk load is required for more than one table and > usually it's done by running jobs one by one. The idea is to provide lists of > tables and corresponding input sources to the MR BulkLoad job. Syntax can be > something like : > yarn ... CsvBulkLoadTool -t table1,table2,table3 --input input1,input2,input3 > Having map tableName => input during map phase we can determine to which > table the current split belongs to and produce necessary tableRowKeyPair. > Any thoughts, suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)