[jira] [Commented] (PHOENIX-5258) Add support to parse header from the input CSV file as input columns for CsvBulkLoadTool

Hadoop QA (JIRA) Fri, 03 May 2019 04:34:58 -0700


    [ 
https://issues.apache.org/jira/browse/PHOENIX-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832441#comment-16832441
 ]


Hadoop QA commented on PHOENIX-5258:
------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12967757/PHOENIX-5258-master.patch
  against master branch at commit 4eec41f3f2b04865b6d59ebd3fbd3aa1e0a0fd80.
  ATTACHMENT ID: 12967757

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
                        Please justify why no new tests are needed for this 
patch.
                        Also please list what manual steps were performed to 
verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:red}-1 release audit{color}.  The applied patch generated 6 release 
audit warnings (more than the master's current 0 warnings).

    {color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
    +            stmt.execute("CREATE TABLE S.TABLE14 (ID INTEGER NOT NULL 
PRIMARY KEY, NAME VARCHAR, TYPE VARCHAR, CATEGORY VARCHAR)");
+                        "Headers in provided input files are different. 
Headers must be unique for all input files"
+    static final Option SKIP_HEADER_OPT = new Option("k", "skip-header", 
false, "Skip the first line of CSV files (the header)");
+    static final Option HEADER_OPT = new Option("r", "header", false, "Parses 
the first line of CSV as the header");
+    private List<String> parseCsvHeaders(CommandLine cmdLine, Configuration 
conf) throws IOException {
+                "Headers in provided input files are different. Headers must 
be unique for all input files"
+    private List<String> fetchAllHeaders(Iterable<String> paths, Configuration 
conf) throws IOException {

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
     
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.IndexRebuildTaskIT
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.join.HashJoinMoreIT
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.UpgradeIT
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.index.MutableIndexSplitForwardScanIT

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/2550//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/2550//artifact/patchprocess/patchReleaseAuditWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/2550//console

This message is automatically generated.

> Add support to parse header from the input CSV file as input columns for 
> CsvBulkLoadTool
> ----------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-5258
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5258
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Prashant Vithani
>            Priority: Minor
>             Fix For: 4.15.0, 5.1.0
>
>         Attachments: PHOENIX-5258-4.x-HBase-1.4.patch, 
> PHOENIX-5258-master.patch
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, CsvBulkLoadTool does not support reading header from the input csv 
> and expects the content of the csv to match with the table schema. The 
> support for the header can be added to dynamically map the schema with the 
> header.
> The proposed solution is to introduce another option for the tool `–header`. 
> If this option is passed, the input columns list is constructed by reading 
> the first line of the input CSV file.
>  * If there is only one file, read the header from the first line and 
> generate the `ColumnInfo` list.
>  * If there are multiple files, read the header from all the files, and throw 
> an error if the headers across files do not match.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PHOENIX-5258) Add support to parse header from the input CSV file as input columns for CsvBulkLoadTool

Reply via email to