jasinliu commented on issue #554: URL: https://github.com/apache/incubator-graphar/issues/554#issuecomment-2257622631
> What is the idea behind pushing #561 to main? If for some reason we need a version of scala library without spark, it can be done in a separate persistent branch. Also, I do not fully understand the problem. Under the hood Spark is relying on the `org.apache.hadoop.fs.FileSystem` that has also an implementation for [local file system](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/LocalFileSystem.html). > > If the problem is in dependency hell, why not to solve it? There are a lot of tools for that in the JVM world, for example, a maven-shading-plugin. > > #561 will break PySpark bindings totally just because at the moment bindings are relying on the `pyspark.sql.SparkSession._jvm`. If we merge it, we will have a constantly failing CI pipelines. > > @jasinliu May you explain a bit more the motivation behind it? What kind of problem or complications are you facing with Spark for local FS? I want to implement the cli tool based on spark dependencies. However, by default, constructing a sparksession requires starting a spark instance, which is slow. My current idea is to read info based on the local file system and use sparksession to get data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
