[GitHub] [beam] aromanenko-dev commented on pull request #22446: Improved pipeline translation in SparkStructuredStreamingRunner

GitBox Wed, 14 Sep 2022 08:16:37 -0700


aromanenko-dev commented on PR #22446:
URL: https://github.com/apache/beam/pull/22446#issuecomment-1246923780


   @mosche I'm totally agree that the current name is not very practical in a 
way that it's quite long and, even worse, very confusing since it contains a 
`Streaming` word in its name but this runner doesn't support streaming mode at 
all (we know the reasons but it is what it is). 
   
   So, it would be better to rename it, though, I'm not sure about 
`SparkSqlRunner` as a new name. IMHO, it may be also confusing and give some 
false expectations that it supports only Spark (or Beam?) SQL pipelines. 
   
   I'd suggest the name `SparkDatasetRunner` since it's based on Spark Dataset 
API. This name is quite short and gives the basic idea of what to expect from 
this runner. Old runner could be called `SparkRDDRunner` but let's keep it as 
it is - just `SparkRunner`.
   
   On the other hand, this renaming will require many incompatible changes, 
starting from new packages and artifacts names. However, I'm pretty sure that 
the most users, that run Beam pipelines on Spark, still use the old classical 
Spark(RDD)Runner. We can check it out on user@ and twitter, if needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] aromanenko-dev commented on pull request #22446: Improved pipeline translation in SparkStructuredStreamingRunner

Reply via email to