[
https://issues.apache.org/jira/browse/SPARK-42965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921870#comment-17921870
]
Asif edited comment on SPARK-42965 at 1/28/25 9:47 PM:
-------------------------------------------------------
I am hitting this issue consistently, after I created a PR yesterday :
[https://github.com/apache/spark/pull/49708|[https://github.com/apache/spark/pull/49708].]
This PR adds a metadata entry in certain situations in the Attribute Field.
If this bug is fixed, I hope to get a clean build for my PR
Actually I am hitting this issue on line 769. That is when isRemote is false
assert all(
index_field.struct_field == struct_field
for index_field, struct_field in zip(index_fields, struct_fields)
), (index_fields, struct_fields)
was (Author: ashahid7):
I am hitting this issue consistently, after I created a PR yesterday :
[https://github.com/apache/spark/pull/49708|[https://github.com/apache/spark/pull/49708].]
This PR adds a metadata entry in certain situations in the Attribute Field.
If this bug is fixed, I hope to get a clean build for my PR
> metadata mismatch for StructField when running some tests.
> ----------------------------------------------------------
>
> Key: SPARK-42965
> URL: https://issues.apache.org/jira/browse/SPARK-42965
> Project: Spark
> Issue Type: Improvement
> Components: Connect, Pandas API on Spark
> Affects Versions: 3.5.0
> Reporter: Haejoon Lee
> Priority: Major
> Fix For: 4.0.0
>
>
> For some reason, the metadata of `StructField` is different in a few tests
> when using Spark Connect. However, the function works properly.
> For example, when running `python/run-tests --testnames
> 'pyspark.pandas.tests.connect.data_type_ops.test_parity_binary_ops
> BinaryOpsParityTests.test_add'` it complains `AssertionError:
> ([InternalField(dtype=int64, struct_field=StructField('bool', LongType(),
> False))], [StructField('bool', LongType(), False)])` because metadata is
> different something like `\{'__autoGeneratedAlias': 'true'}` but they have
> same name, type and nullable, so the function just works well.
> Therefore, we have temporarily added a branch for Spark Connect in the code
> so that we can create InternalFrame properly to provide more pandas APIs in
> Spark Connect. If a clear cause is found, we may need to revert it back to
> its original state.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]