Re: [I] feat(spark): Support local filesystem without sparksession [incubator-graphar]

via GitHub Tue, 30 Jul 2024 00:04:35 -0700


jasinliu commented on issue #554:
URL: 
https://github.com/apache/incubator-graphar/issues/554#issuecomment-2257622631


   > What is the idea behind pushing #561 to main? If for some reason we need a 
version of scala library without spark, it can be done in a separate persistent 
branch. Also, I do not fully understand the problem. Under the hood Spark is 
relying on the `org.apache.hadoop.fs.FileSystem` that has also an 
implementation for [local file 
system](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/LocalFileSystem.html).
   > 
   > If the problem is in dependency hell, why not to solve it? There are a lot 
of tools for that in the JVM world, for example, a maven-shading-plugin.
   > 
   > #561 will break PySpark bindings totally just because at the moment 
bindings are relying on the `pyspark.sql.SparkSession._jvm`. If we merge it, we 
will have a constantly failing CI pipelines.
   > 
   > @jasinliu May you explain a bit more the motivation behind it? What kind 
of problem or complications are you facing with Spark for local FS?
   
   I want to implement the cli tool based on spark dependencies. However, by 
default, constructing a sparksession requires starting a spark instance, which 
is slow. My current idea is to read info based on the local file system and use 
sparksession to get data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] feat(spark): Support local filesystem without sparksession [incubator-graphar]

Reply via email to