Re: [PR] DRAFT OF A DATAFRAME API [wayang]

via GitHub Fri, 20 Mar 2026 02:57:50 -0700


CarloMariaProietti commented on PR #716:
URL: https://github.com/apache/wayang/pull/716#issuecomment-4096941252


   > Hi Carlo, I studied the new commits carefully — this is excellent 
progress! I noticed you used Java Records for both Row and Schema which is 
exactly the direction I suggested on issue #514. The SparkSelectOperator using 
Dataset[Row] with functions::col is a clean implementation. Looking at 
SparkSelectOperator, I see getSupportedInputChannels and 
getSupportedOutputChannels return empty lists — would DatasetChannel 
descriptors be the right choice here to keep execution within the Dataset world 
and avoid RDD conversions? This connects to issue #362 about DataFrameChannel 
that I was studying.
   
   Hi, I am glad that you also think that using record class might be a good 
choice, however I would like to underline that the implementation included 
records even before your suggestion on issue 514. 
   You are right when you suggest that the execution 'should be kept in the 
Dataset world', Dataset<Row> is exactly the Spark implemmentation of the 
DataFrame abstraction; which is exactly what the new Wayang API should provide.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] DRAFT OF A DATAFRAME API [wayang]

Reply via email to