[
https://issues.apache.org/jira/browse/SPARK-30855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benoit Roy updated SPARK-30855:
-------------------------------
Description:
Im in the early phase of writting a custom spark datasource for messagepack.
I have a basic implementation running in most cases but have encountered this
issue when I tried to implement a UnaryExpression, in much the same way 'from
_json' and 'from_avro' do, that converts a binary value (holding messagepack)
into spark catalyst types.
When using 'from_msgpack', which returns one InternalRow, all is well. However,
if I use 'from_msgpack2' which uses 'explode' to return multiple InternalRow,
any select follow up selects within the df results in a
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding
attribute, tree: _gen_alias_40#40 Couldn't find _gen_alias_40#40 in
[result#4|#4]
Example code that produces the issue:
```
df.select(from_msgpack2(col("value"), mp_schema).as("result"))
.select("result.*")
.show()
```
I've attached the code base as an archive to help you reproduce the issue. The
test class org.apache.spark.sql.messagepack.MessagePackSparkIssue contains test
cases that will help you reproduce the exception.
It is entirely possible, and overall likely, that this is caused by a bad
implementation, or wrong expectation, on my part. Any guidance help, feedback
on the cause of this exception would be greatly appreciated. I've scoured the
traditional sources for help to resolve this but haven't found anything related
this this case.
Cheers and Thanks.
was:
Im in the early phase of writting a custom spark datasource for messagepack.
I have a basic implementation running in most cases but have encountered this
issue when I tried to implement a UnaryExpression, in much the same way 'from
_json' and 'from_avro' do, that converts a binary value (holding messagepack)
into spark catalyst types.
When using 'from_msgpack', which returns one InternalRow, all is well. However,
if I use 'from_msgpack2' which uses 'explode' to return multiple InternalRow,
any select follow up selects within the df results in a
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding
attribute, tree: _gen_alias_40#40 Couldn't find _gen_alias_40#40 in [result#4]
I've attached the code base as an archive to help you reproduce the issue. The
test class org.apache.spark.sql.messagepack.MessagePackSparkIssue contains test
cases that will help you reproduce the exception.
It is entirely possible, and overall likely, that this is caused by a bad
implementation, or wrong expectation, on my part. Any guidance help, feedback
on the cause of this exception would be greatly appreciated. I've scoured the
traditional sources for help to resolve this but haven't found anything related
this this case.
Cheers and Thanks.
> Issue with Binding attribute: Couldn't find _gen_alias_ when using 'explode'
> function.
> --------------------------------------------------------------------------------------
>
> Key: SPARK-30855
> URL: https://issues.apache.org/jira/browse/SPARK-30855
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Benoit Roy
> Priority: Major
> Attachments: messagepack-datasource.zip
>
>
> Im in the early phase of writting a custom spark datasource for messagepack.
>
> I have a basic implementation running in most cases but have encountered this
> issue when I tried to implement a UnaryExpression, in much the same way 'from
> _json' and 'from_avro' do, that converts a binary value (holding messagepack)
> into spark catalyst types.
>
> When using 'from_msgpack', which returns one InternalRow, all is well.
> However, if I use 'from_msgpack2' which uses 'explode' to return multiple
> InternalRow, any select follow up selects within the df results in a
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding
> attribute, tree: _gen_alias_40#40 Couldn't find _gen_alias_40#40 in
> [result#4|#4]
>
> Example code that produces the issue:
> ```
> df.select(from_msgpack2(col("value"), mp_schema).as("result"))
> .select("result.*")
> .show()
> ```
>
> I've attached the code base as an archive to help you reproduce the issue.
> The test class org.apache.spark.sql.messagepack.MessagePackSparkIssue
> contains test cases that will help you reproduce the exception.
>
> It is entirely possible, and overall likely, that this is caused by a bad
> implementation, or wrong expectation, on my part. Any guidance help,
> feedback on the cause of this exception would be greatly appreciated. I've
> scoured the traditional sources for help to resolve this but haven't found
> anything related this this case.
>
> Cheers and Thanks.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]