CarloMariaProietti commented on PR #716: URL: https://github.com/apache/wayang/pull/716#issuecomment-4096941252
> Hi Carlo, I studied the new commits carefully — this is excellent progress! I noticed you used Java Records for both Row and Schema which is exactly the direction I suggested on issue #514. The SparkSelectOperator using Dataset[Row] with functions::col is a clean implementation. Looking at SparkSelectOperator, I see getSupportedInputChannels and getSupportedOutputChannels return empty lists — would DatasetChannel descriptors be the right choice here to keep execution within the Dataset world and avoid RDD conversions? This connects to issue #362 about DataFrameChannel that I was studying. Hi, I am glad that you also think that using record class might be a good choice, however I would like to underline that the implementation included records even before your suggestion on issue 514. You are right when you suggest that the execution 'should be kept in the Dataset world', Dataset<Row> is exactly the Spark implemmentation of the DataFrame abstraction; which is exactly what the new Wayang API should provide. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
