[
https://issues.apache.org/jira/browse/FLINK-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alejandro Sellero updated FLINK-13292:
--------------------------------------
Description:
When I try to read an Orc file using flink-orc an NullPointerException
exception is thrown.
I think this issue could be related with this closed issue
https://issues.apache.org/jira/browse/FLINK-8230
This happens when trying to read the string fields in a nested struct. This is
my schema:
{code:java}
"struct<" +
"operation:int," +
"originalTransaction:bigInt," +
"bucket:int," +
"rowId:bigInt," +
"currentTransaction:bigInt," +
"row:struct<" +
"id:int," +
"headline:string," +
"user_id:int," +
"company_id:int," +
"created_at:timestamp," +
"updated_at:timestamp," +
"link:string," +
"is_html:tinyint," +
"source:string," +
"company_feed_id:int," +
"editable:tinyint," +
"body_clean:string," +
"activitystream_activity_id:bigint," +
"uniqueness_checksum:string," +
"rating:string," +
"review_id:int," +
"soft_deleted:tinyint," +
"type:string," +
"metadata:string," +
"url:string," +
"imagecache_uuid:string," +
"video_id:int" +
">>",{code}
{code:java}
[error] Caused by: java.lang.NullPointerException
[error] at java.lang.String.checkBounds(String.java:384)
[error] at java.lang.String.<init>(String.java:462)
[error] at
org.apache.flink.orc.OrcBatchReader.readString(OrcBatchReader.java:1216)
[error] at
org.apache.flink.orc.OrcBatchReader.readNonNullBytesColumnAsString(OrcBatchReader.java:328)
[error] at
org.apache.flink.orc.OrcBatchReader.readField(OrcBatchReader.java:215)
[error] at
org.apache.flink.orc.OrcBatchReader.readNonNullStructColumn(OrcBatchReader.java:453)
[error] at
org.apache.flink.orc.OrcBatchReader.readField(OrcBatchReader.java:250)
[error] at
org.apache.flink.orc.OrcBatchReader.fillRows(OrcBatchReader.java:143)
[error] at
org.apache.flink.orc.OrcRowInputFormat.ensureBatch(OrcRowInputFormat.java:333)
[error] at
org.apache.flink.orc.OrcRowInputFormat.reachedEnd(OrcRowInputFormat.java:313)
[error] at
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:190)
[error] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
[error] at java.lang.Thread.run(Thread.java:748){code}
Instead to use the TableApi I am trying to read the orc files in the Batch mode
as following:
{code:java}
env
.readFile(
new OrcRowInputFormat(
"",
"SCHEMA_GIVEN_BEFORE",
new HadoopConfiguration()
),
"PATH_TO_FOLDER"
)
.writeAsText("file:///tmp/test/fromOrc")
{code}
Thanks for your support
was:
When I try to read an Orc file using flink-orc an NullPointerException
exception is thrown.
I think this issue could be related with this closed issue
https://issues.apache.org/jira/browse/FLINK-8230
This happens when trying to read the string fields in a nested struct. This is
my schema:
{code:java}
"struct<" +
"operation:int," +
"originalTransaction:bigInt," +
"bucket:int," +
"rowId:bigInt," +
"currentTransaction:bigInt," +
"row:struct<" +
"id:int," +
"headline:string," +
"user_id:int," +
"company_id:int," +
"created_at:timestamp," +
"updated_at:timestamp," +
"link:string," +
"is_html:tinyint," +
"source:string," +
"company_feed_id:int," +
"editable:tinyint," +
"body_clean:string," +
"activitystream_activity_id:bigint," +
"uniqueness_checksum:string," +
"rating:string," +
"kununu_review_id:int," +
"soft_deleted:tinyint," +
"type:string," +
"metadata:string," +
"url:string," +
"imagecache_uuid:string," +
"video_id:int" +
">>",{code}
{code:java}
[error] Caused by: java.lang.NullPointerException
[error] at java.lang.String.checkBounds(String.java:384)
[error] at java.lang.String.<init>(String.java:462)
[error] at
org.apache.flink.orc.OrcBatchReader.readString(OrcBatchReader.java:1216)
[error] at
org.apache.flink.orc.OrcBatchReader.readNonNullBytesColumnAsString(OrcBatchReader.java:328)
[error] at
org.apache.flink.orc.OrcBatchReader.readField(OrcBatchReader.java:215)
[error] at
org.apache.flink.orc.OrcBatchReader.readNonNullStructColumn(OrcBatchReader.java:453)
[error] at
org.apache.flink.orc.OrcBatchReader.readField(OrcBatchReader.java:250)
[error] at
org.apache.flink.orc.OrcBatchReader.fillRows(OrcBatchReader.java:143)
[error] at
org.apache.flink.orc.OrcRowInputFormat.ensureBatch(OrcRowInputFormat.java:333)
[error] at
org.apache.flink.orc.OrcRowInputFormat.reachedEnd(OrcRowInputFormat.java:313)
[error] at
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:190)
[error] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
[error] at java.lang.Thread.run(Thread.java:748){code}
Instead to use the TableApi I am trying to read the orc files in the Batch mode
as following:
{code:java}
env
.readFile(
new OrcRowInputFormat(
"",
"SCHEMA_GIVEN_BEFORE",
new HadoopConfiguration()
),
"PATH_TO_FOLDER"
)
.writeAsText("file:///tmp/test/fromOrc")
{code}
Thanks for your support
> NullPointerException when reading a string field in a nested struct from an
> Orc file.
> -------------------------------------------------------------------------------------
>
> Key: FLINK-13292
> URL: https://issues.apache.org/jira/browse/FLINK-13292
> Project: Flink
> Issue Type: Bug
> Components: Connectors / ORC
> Affects Versions: 1.8.0
> Reporter: Alejandro Sellero
> Priority: Major
>
> When I try to read an Orc file using flink-orc an NullPointerException
> exception is thrown.
> I think this issue could be related with this closed issue
> https://issues.apache.org/jira/browse/FLINK-8230
> This happens when trying to read the string fields in a nested struct. This
> is my schema:
> {code:java}
> "struct<" +
> "operation:int," +
> "originalTransaction:bigInt," +
> "bucket:int," +
> "rowId:bigInt," +
> "currentTransaction:bigInt," +
> "row:struct<" +
> "id:int," +
> "headline:string," +
> "user_id:int," +
> "company_id:int," +
> "created_at:timestamp," +
> "updated_at:timestamp," +
> "link:string," +
> "is_html:tinyint," +
> "source:string," +
> "company_feed_id:int," +
> "editable:tinyint," +
> "body_clean:string," +
> "activitystream_activity_id:bigint," +
> "uniqueness_checksum:string," +
> "rating:string," +
> "review_id:int," +
> "soft_deleted:tinyint," +
> "type:string," +
> "metadata:string," +
> "url:string," +
> "imagecache_uuid:string," +
> "video_id:int" +
> ">>",{code}
> {code:java}
> [error] Caused by: java.lang.NullPointerException
> [error] at java.lang.String.checkBounds(String.java:384)
> [error] at java.lang.String.<init>(String.java:462)
> [error] at
> org.apache.flink.orc.OrcBatchReader.readString(OrcBatchReader.java:1216)
> [error] at
> org.apache.flink.orc.OrcBatchReader.readNonNullBytesColumnAsString(OrcBatchReader.java:328)
> [error] at
> org.apache.flink.orc.OrcBatchReader.readField(OrcBatchReader.java:215)
> [error] at
> org.apache.flink.orc.OrcBatchReader.readNonNullStructColumn(OrcBatchReader.java:453)
> [error] at
> org.apache.flink.orc.OrcBatchReader.readField(OrcBatchReader.java:250)
> [error] at
> org.apache.flink.orc.OrcBatchReader.fillRows(OrcBatchReader.java:143)
> [error] at
> org.apache.flink.orc.OrcRowInputFormat.ensureBatch(OrcRowInputFormat.java:333)
> [error] at
> org.apache.flink.orc.OrcRowInputFormat.reachedEnd(OrcRowInputFormat.java:313)
> [error] at
> org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:190)
> [error] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
> [error] at java.lang.Thread.run(Thread.java:748){code}
> Instead to use the TableApi I am trying to read the orc files in the Batch
> mode as following:
> {code:java}
> env
> .readFile(
> new OrcRowInputFormat(
> "",
> "SCHEMA_GIVEN_BEFORE",
> new HadoopConfiguration()
> ),
> "PATH_TO_FOLDER"
> )
> .writeAsText("file:///tmp/test/fromOrc")
> {code}
> Thanks for your support
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)