Hi Aurelien,
Maybe @juri can help you out with the PythonWorkerManager implementation.
Best
--
Zoi

    Στις Τρίτη 25 Μαρτίου 2025 στις 09:13:31 π.μ. CET, ο χρήστης Aurélien 
Bertrand <aurelien9.bertr...@gmail.com> έγραψε:  
 
 Dear all,

My name is Aurélien and I have been helping with the latest demo paper by
devising ParquetSource operators which I plan to commit soon.

While implementing the Spark forecast pipeline in Wayang, I noticed that
the JoinOperator implemented in Python was not working for me. It acts as a
cartesian product because it fails to get the keys (and thus the probeTable
in the JavaJoinOperator looks like {null: [all keys]} for both data quanta).

I implemented a simple test case (same as the TestJavaJoinOperator):

Left:
1,"b"
1,"c"
2,"d"
3,"e"

Right:
1,"x"
1,"y"
2,"z"
4,"w"

I wondered whether someone had tested the operator yet or if anyone can
manage to get the expected results from this sample data.

I checked the implementation: the keys are effectively extracted in the
JoinOperator and sent to the worker.py, and then something seems to
happen... The PythonWorkerManager reads the dataquanta effectively but not
the keys. Are they supposed to be read there, and does anyone know what
could go wrong (e.g., reading simple data like [1])?

Thank you in advance for your help.

Best regards,
Aurélien
  

Reply via email to