[
https://issues.apache.org/jira/browse/BEAM-7210?focusedWorklogId=238157&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-238157
]
ASF GitHub Bot logged work on BEAM-7210:
----------------------------------------
Author: ASF GitHub Bot
Created on: 06/May/19 21:36
Start Date: 06/May/19 21:36
Worklog Time Spent: 10m
Work Description: reuvenlax commented on issue #8474: [BEAM-7210] Fix
non-deterministic row access
URL: https://github.com/apache/beam/pull/8474#issuecomment-489791371
So this is deterministic as long as you resolve the ids beforehand. E.g.
the following is deterministic:
@DefaultFieldSchema(AutoValueSchema.class)
@AutoValue
class Foo {...}
PCollection<Foo> foos = readFoos();
int userIdField = foos.getSchema().indexOf("userId");
foos.apply(ParDo.of(....
void process(@Element Row row) {
row.get(userIdField);
}
In fact all of the schema transforms (Group, Join, etc.) do just this for
efficiency. They resolve the field names into field positions at expansion
time, and then always access by field id inside the actual transform.
Reuven
On Mon, May 6, 2019 at 2:01 AM Maximilian Michels <[email protected]>
wrote:
> *@mxm* approved this pull request.
>
> Thanks for the fix @reuvenlax <https://github.com/reuvenlax>! I still
> wonder whether we should prevent non-deterministic positional row access
> but this can be handled independently of this PR.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/beam/pull/8474#pullrequestreview-233884179>,
> or mute the thread
>
<https://github.com/notifications/unsubscribe-auth/AFAYJVM6OH5AXZMVPCHWFQLPT7XW7ANCNFSM4HKCZFNQ>
> .
>
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 238157)
Time Spent: 1h (was: 50m)
> Test ParDoSchemaTest#testInferredSchemaPipeline is flaky
> --------------------------------------------------------
>
> Key: BEAM-7210
> URL: https://issues.apache.org/jira/browse/BEAM-7210
> Project: Beam
> Issue Type: Bug
> Components: runner-flink, sdk-java-core
> Reporter: Maximilian Michels
> Assignee: Reuven Lax
> Priority: Major
> Fix For: 2.13.0
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> Test ParDoSchemaTest#testInferredSchemaPipeline is flaky. Please see
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/3918/testReport/junit/org.apache.beam.sdk.transforms/ParDoSchemaTest/testInferredSchemaPipeline/]
> It seems like the backing row data is not populated in a deterministic way:
> {noformat}
> Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to
> java.lang.String
> at org.apache.beam.sdk.values.Row.getString(Row.java:279)
> at
> org.apache.beam.sdk.transforms.ParDoSchemaTest$12.process(ParDoSchemaTest.java:391){noformat}
> CC [~reuvenlax]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)