chamikaramj opened a new issue, #32795:
URL: https://github.com/apache/beam/issues/32795

   ### What happened?
   
   I'm getting following (via the ExpansionService) when upgrading the Iceberg 
Write transform.
   
   ```
   2024/10/16 02:58:02 E1016 02:58:02.041566      11 
managed_transforms_worker_main.cc:138] Failed to upgrade using the expansion 
service manager: INTERNAL: Expansion request failed: 
java.lang.IllegalArgumentException: unable to serialize SchemaCoder<Schema: 
Fields:
   2024/10/16 02:58:02 Field{name=tableIdentifierString, description=, 
type=STRING NOT NULL, options={{}}}
   2024/10/16 02:58:02 Field{name=serializableDataFile, description=, 
type=ROW<path STRING NOT NULL, fileFormat STRING NOT NULL, recordCount INT64 
NOT NULL, fileSizeInBytes INT64 NOT NULL, partitionPath STRING NOT NULL, 
partitionSpecId INT32 NOT NULL, keyMetadata BYTES, splitOffsets ARRAY<INT64 NOT 
NULL>, columnSizes MAP<INT32 NOT NULL, INT64 NOT NULL>, valueCounts MAP<INT32 
NOT NULL, INT64 NOT NULL>, nullValueCounts MAP<INT32 NOT NULL, INT64 NOT NULL>, 
nanValueCounts MAP<INT32 NOT NULL, INT64 NOT NULL>, lowerBounds MAP<INT32 NOT 
NULL, BYTES NOT NULL>, upperBounds MAP<INT32 NOT NULL, BYTES NOT NULL>> NOT 
NULL, options={{}}}
   2024/10/16 02:58:02 Encoding positions:
   2024/10/16 02:58:02 {tableIdentifierString=0, serializableDataFile=1}
   2024/10/16 02:58:02 Options:{{}}UUID: 1373ba11-1080-4271-b79a-985f2ff03727  
UUID: 1373ba11-1080-4271-b79a-985f2ff03727 delegateCoder: 
org.apache.beam.sdk.coders.Coder$ByteBuddy$X4azj9mR@4a19cae6
   2024/10/16 02:58:02     at 
org.apache.beam.sdk.util.SerializableUtils.serializeToByteArray(SerializableUtils.java:59)
   2024/10/16 02:58:02     at 
org.apache.beam.sdk.util.construction.CoderTranslation.toCustomCoder(CoderTranslation.java:158)
   2024/10/16 02:58:02     at 
org.apache.beam.sdk.util.construction.CoderTranslation.toProto(CoderTranslation.java:118)
   2024/10/16 02:58:02     at 
org.apache.beam.sdk.util.construction.SdkComponents.registerCoder(SdkComponents.java:284)
   2024/10/16 02:58:02     at 
org.apache.beam.sdk.util.construction.PCollectionTranslation.toProto(PCollectionTranslation.java:35)
   2024/10/16 02:58:02     at 
org.apache.beam.sdk.util.construction.SdkComponents.registerPCollection(SdkComponents.java:239)
   2024/10/16 02:58:02     at 
org.apache.beam.sdk.util.construction.PTransformTranslation.translateAppliedPTransform(PTransformTranslation.java:610)
   2024/10/16 02:58:02     at 
org.apache.beam.sdk.util.construction.ParDoTranslation$ParDoTranslator.translate(ParDoTranslation.java:184)
   2024/10/16 02:58:02     at 
org.apache.beam.sdk.util.construction.PTransformTranslation.toProto(PTransformTranslation.java:277)
   2024/10/16 02:58:02     at 
org.apache.beam.sdk.util.construction.SdkComponents.registerPTransform(SdkComponents.java:183)
   2024/10/16 02:58:02     at 
org.apache.beam.sdk.util.construction.PipelineTranslation$1.visitPrimitiveTransform(PipelineTranslation.java:96)
   2024/10/16 02:58:02     at 
org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:593)
   2024/10/16 02:58:02     at 
org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:585)
   ```
   
   Seems like the coder for `FileWriteResult` is failing during translation. 
I'm not sure why the SchemaCoder didn't properly resolve for 
`SerializableDataFile` (so it ended up defaulting to `SerializableCoder`).
   
   ### Issue Priority
   
   Priority: 1 (data loss / total loss of function)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [X] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [X] Component: IO connector
   - [ ] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Infrastructure
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to