Re: [I] [Task]: When running a Hop pipeline on a Spark Standalone cluster, the pipeline generates temporary files instead of the exact output file specified in the configuration. (hop)

via GitHub Thu, 06 Feb 2025 02:34:43 -0800


Raja10D commented on issue #4865:
URL: https://github.com/apache/hop/issues/4865#issuecomment-2639435985


   > "large" in the context of an Excel file is not what is considered "large" 
in the context of a Spark cluster. Reading/writing Excel files on a distributed 
Spark cluster sounds like a square peg, round hole problem to me, but it 
shouldn't be impossible. You'll need to check your beam + spark configuration 
to tweak your pipeline.
   
   For additional context, I am using Hop 2.8.0 to create pipelines, Apache 
Beam 2.50, Spark 3.4.4, and Java 11. The Spark Standalone cluster is set up 
with the master on my local PC, and the worker is also on the same PC.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Task]: When running a Hop pipeline on a Spark Standalone cluster, the pipeline generates temporary files instead of the exact output file specified in the configuration. (hop)

Reply via email to