[ 
https://issues.apache.org/jira/browse/PARQUET-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17750703#comment-17750703
 ] 

ASF GitHub Bot commented on PARQUET-2330:
-----------------------------------------

sekikn opened a new pull request, #1127:
URL: https://github.com/apache/parquet-mr/pull/1127

   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Parquet 
Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references 
them in the PR title. For example, "PARQUET-1234: My Parquet PR"
     - https://issues.apache.org/jira/browse/PARQUET-2330
     - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   I think this fix is so simple and explicit that it doesn't need an 
additional test. Instead, I ran the fixed version of this command manually and 
ensured that it showed the correct position of the invalid record.
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
     - All the public functions and the classes in the PR contain Javadoc that 
explain what it does
   




> Fix convert-csv to show the correct position of the invalid record
> ------------------------------------------------------------------
>
>                 Key: PARQUET-2330
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2330
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-cli
>            Reporter: Kengo Seki
>            Assignee: Kengo Seki
>            Priority: Minor
>
> Given the following input:
> {code}
> $ cat /tmp/input
> 0
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> a
> {code}
> running the convert-csv subcommand shows a wrong position (0) for the invalid 
> record, as follows:
> {code}
> $ java -cp 'target/parquet-cli-1.14.0-SNAPSHOT.jar:target/dependency/*' 
> org.apache.parquet.cli.Main convert-csv /tmp/input --no-header -o /tmp/output
> Unknown error
> java.lang.RuntimeException: Failed on record 0
>       at 
> org.apache.parquet.cli.commands.ConvertCSVCommand.run(ConvertCSVCommand.java:186)
>       at org.apache.parquet.cli.Main.run(Main.java:163)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
>       at org.apache.parquet.cli.Main.main(Main.java:193)
> Caused by: org.apache.parquet.cli.util.RecordException: Field field_0: value 
> not a ["null","long"]: 'a'
>       at 
> org.apache.parquet.cli.csv.RecordBuilder.makeValue(RecordBuilder.java:125)
>       at 
> org.apache.parquet.cli.csv.RecordBuilder.fillIndexed(RecordBuilder.java:98)
>       at 
> org.apache.parquet.cli.csv.RecordBuilder.makeRecord(RecordBuilder.java:75)
>       at org.apache.parquet.cli.csv.AvroCSVReader.next(AvroCSVReader.java:84)
>       at 
> org.apache.parquet.cli.commands.ConvertCSVCommand.run(ConvertCSVCommand.java:182)
>       ... 3 more
> Caused by: java.lang.NumberFormatException: For input string: "a"
>       at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>       at java.lang.Long.parseLong(Long.java:589)
>       at java.lang.Long.valueOf(Long.java:803)
>       at 
> org.apache.parquet.cli.csv.RecordBuilder.makeValue(RecordBuilder.java:163)
>       at 
> org.apache.parquet.cli.csv.RecordBuilder.makeValue(RecordBuilder.java:178)
>       at 
> org.apache.parquet.cli.csv.RecordBuilder.makeValue(RecordBuilder.java:113)
>       ... 7 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to