Dear all, My name is Aurélien and I have been helping with the latest demo paper by devising ParquetSource operators which I plan to commit soon.
While implementing the Spark forecast pipeline in Wayang, I noticed that the JoinOperator implemented in Python was not working for me. It acts as a cartesian product because it fails to get the keys (and thus the probeTable in the JavaJoinOperator looks like {null: [all keys]} for both data quanta). I implemented a simple test case (same as the TestJavaJoinOperator): Left: 1,"b" 1,"c" 2,"d" 3,"e" Right: 1,"x" 1,"y" 2,"z" 4,"w" I wondered whether someone had tested the operator yet or if anyone can manage to get the expected results from this sample data. I checked the implementation: the keys are effectively extracted in the JoinOperator and sent to the worker.py, and then something seems to happen... The PythonWorkerManager reads the dataquanta effectively but not the keys. Are they supposed to be read there, and does anyone know what could go wrong (e.g., reading simple data like [1])? Thank you in advance for your help. Best regards, Aurélien