[ 
https://issues.apache.org/jira/browse/BEAM-7864?focusedWorklogId=302652&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-302652
 ]

ASF GitHub Bot logged work on BEAM-7864:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 28/Aug/19 08:16
            Start Date: 28/Aug/19 08:16
    Worklog Time Spent: 10m 
      Work Description: iemejia commented on issue #9410: [BEAM-7864] 
Simplify/generalize Spark reshuffle translation
URL: https://github.com/apache/beam/pull/9410#issuecomment-525635634
 
 
   Thanks Kyle.
   
   @RyanSkraba raises some valid points. There is something weird in our 
current translation and the fact that we ignore keys in particular for the 
`Reshuffle.viaRandomKey()` case.
   We should maybe fill a JIRA to track this + discuss in the mailing list. 
(some [previous discussion on Reshuffle 
here](https://lists.apache.org/thread.html/820064a81c86a6d44f21f0d6c68ea3f46cec151e5e1a0b52eeed3fbf@%3Cdev.beam.apache.org%3E)).
   
   I was also wondering to what extent in our current implementation (and in 
particular for the random key case) we could do a repartition with more 
partitions (based on available CPUs). Of course this has the risk of eating 
more resources than defined by the job but on the other hand it could be a way 
to optimize such shuffles downstream. [but well this is a different subject 
just thinking]
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 302652)
    Time Spent: 2h 40m  (was: 2.5h)

> Portable Spark Reshuffle coder cast exception
> ---------------------------------------------
>
>                 Key: BEAM-7864
>                 URL: https://issues.apache.org/jira/browse/BEAM-7864
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>            Reporter: Kyle Weaver
>            Assignee: Kyle Weaver
>            Priority: Major
>              Labels: portability-spark
>             Fix For: 2.16.0
>
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> running :sdks:python:test-suites:portable:py35:portableWordCountBatch in 
> either loopback or docker mode on master fails with exception:
>  
> java.lang.ClassCastException: org.apache.beam.sdk.coders.LengthPrefixCoder 
> cannot be cast to org.apache.beam.sdk.coders.KvCoder
>  at 
> org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translateReshuffle(SparkBatchPortablePipelineTranslator.java:400)
>  at 
> org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translate(SparkBatchPortablePipelineTranslator.java:147)
>  at 
> org.apache.beam.runners.spark.SparkPipelineRunner.lambda$run$1(SparkPipelineRunner.java:96)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to