[jira] [Commented] (PHOENIX-5258) Add support to parse header from the input CSV file as input columns for CsvBulkLoadTool

Josh Elser (JIRA) Fri, 03 May 2019 11:16:15 -0700


    [ 
https://issues.apache.org/jira/browse/PHOENIX-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832716#comment-16832716
 ]


Josh Elser commented on PHOENIX-5258:
-------------------------------------

{code:java}
+        try(FSDataInputStream inputStream = fs.open(new Path(path))) {
+            String header = new BufferedReader(new 
InputStreamReader(inputStream)).readLine();
+            inputStream.close();
+            return header;
+        }
{code}
Closing the inputStream when you are using try-with-resources is unnecessary. 
Can you please create the BufferedReader within the try-with-resources as well? 
e.g.
{code:java}
try (FSDatInputStream inputStream = fs.open(new Path(path));
     Reader reader = new BufferedReader(new InputStreamReader(inputStream))) {
  return header.readLine();
}{code}
Some test cases which look to be missing:
 * What happens if the user provides {{--header}} but there is no header on the 
CSV file? (should error)
 * What happens if the user provides both {{--header}} and {{--skip-header}}? 
(should error)

Looks pretty close otherwise. Good work.

> Add support to parse header from the input CSV file as input columns for 
> CsvBulkLoadTool
> ----------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-5258
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5258
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Prashant Vithani
>            Priority: Minor
>             Fix For: 4.15.0, 5.1.0
>
>         Attachments: PHOENIX-5258-4.x-HBase-1.4.patch, 
> PHOENIX-5258-master.patch
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, CsvBulkLoadTool does not support reading header from the input csv 
> and expects the content of the csv to match with the table schema. The 
> support for the header can be added to dynamically map the schema with the 
> header.
> The proposed solution is to introduce another option for the tool `–header`. 
> If this option is passed, the input columns list is constructed by reading 
> the first line of the input CSV file.
>  * If there is only one file, read the header from the first line and 
> generate the `ColumnInfo` list.
>  * If there are multiple files, read the header from all the files, and throw 
> an error if the headers across files do not match.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PHOENIX-5258) Add support to parse header from the input CSV file as input columns for CsvBulkLoadTool

Reply via email to