Thanks for letting us know. Yes, our source code is not supposed to be compiled on Windows. I didn't expect so much trouble to get this jar. We will figure a better way to solve this issue soon.
On Thu, Feb 11, 2021 at 1:46 AM Grégory Dugernier <g...@aloalto.com> wrote: > In fact, you should let us know about your situation early on. In fact, >> you can download the GeoTools jars manually and copy to SPARK_HOME/jars/ >> folder... You don't have to compile the code. Download links are given in >> the comments: >> http://sedona.apache.org/download/GeoSpark-All-Modules-Maven-Central-Coordinates/#geotools-240 > > > I did copy the Geotools jars and added them to my cluster library, but > python-adapter didn't seem to find them in the FileStore. Placing the jars > inside SPARK_HOME on the cluster means trying to first determine where the > environment variable points to inside the DBFS architecture, then most > likely add them through CLI commands. This represented several short terms > obstacles, but also raised many issues down the line, because we are > deploying our clusters through Terraform and not all developers will have > the elevated permissions to perform CLI commands. A single, compiled jar > with all the dependencies within can easily be deployed at cluster creation > with a databricks_dbfs_file > <https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/resources/dbfs_file> > resource and using the library.jar property of databricks_cluster > <https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/resources/cluster#library-configuration-block>. > The jar ended up to be a bit of a headache to produce, but it keeps things > high level and easier to maintain. > > That is, of course, unless I'm missing the obvious and there was an easy > way to add GeoTools jars on the Databricks cluster and let > sedona-python-adapter find them, which isn't entirely excluded. > > On Thu, 11 Feb 2021 at 10:03, Jia Yu <ji...@apache.org> wrote: > >> Thanks, Gregory. I think this behavior is not expected. We will look into >> this. >> >> In fact, you should let us know about your situation early on. In fact, >> you can download the GeoTools jars manually and copy to SPARK_HOME/jars/ >> folder... You don't have to compile the code. Download links are given in >> the comments: >> http://sedona.apache.org/download/GeoSpark-All-Modules-Maven-Central-Coordinates/#geotools-240 >> >> We should make our doc more clear. >> >> >> On Thu, Feb 11, 2021 at 12:44 AM Grégory Dugernier <g...@aloalto.com> >> wrote: >> >>> Hi Jia, >>> >>> After much sweat and tears, I went the long road and compiled the code >>> locally. I'm working on Windows so I had to change a few things in the >>> POM.xml: >>> >>> - When trying to compile just the python-adapter lib, Maven didn't >>> like the dynamic versioning of sedona-core and sedona-sql, so I had to >>> hardcode the current version. >>> - For some reason, Maven couldn't find spark-version-converter from >>> within the python-adapter directory, so I just decided to compile the >>> full >>> library. It might be possible to just compile the adapter, I just decided >>> pushing in this direction further seemed like it would take longer. >>> - When trying to compile the full library, the attach-javadoc goal >>> just keep erroring-out, even with the latest version of >>> maven-javadoc-plugin, so I just removed it entirely. >>> >>> By the end, I got the jar, uploaded it in Databricks and it works like a >>> charm so far. >>> >>> I did however meet another issue, it seems that when using >>> *ShapefileReader.readToGeometryRDD(spark.sparkContext, >>> file_url) *to read multiple Shapefiles files at once, then use the >>> Adapter, same-named columns aren't combined in the resulting DataFrame (see >>> example below). It might be normal RDD behavior -I have little experience >>> using them instead of DataFrames-, and I already found a workaround by >>> creating multiple dfs and using union(), but I prefer to let you know in >>> case it isn't the expected behavior. >>> [image: image.png] >>> >>> Regards, >>> Grégory >>> >>> On Thu, 11 Feb 2021 at 07:58, Jia Yu <ji...@apache.org> wrote: >>> >>>> Hi Gregory, >>>> >>>> Please let us know if you get your issue fixed. I know many of our >>>> users are also using Databricks cluster. We are also interested in the >>>> solution. >>>> >>>> Thanks, >>>> Jia >>>> >>>> On Wed, Feb 10, 2021 at 5:17 AM Grégory Dugernier <g...@aloalto.com> >>>> wrote: >>>> >>>>> Thank you for the quick reply! >>>>> >>>>> It seems my particular situation is a bit more complex than that, >>>>> since I'm running the notebook on a Databricks cluster, and the default >>>>> spark config doesn't seem to allow for more jar repositories (GeoTools >>>>> isn't on Maven Central), nor does creating a new SparkSession appears to >>>>> work. I've tried to download the jars and add them manually to the cluster >>>>> but it doesn't seem to work either. But at least I know where the issue's >>>>> at! >>>>> >>>>> Thanks again for your help, >>>>> Regards >>>>> >>>>> On Wed, 10 Feb 2021 at 12:22, Jia Yu <ji...@apache.org> wrote: >>>>> >>>>>> Hi Gregory, >>>>>> >>>>>> Thanks for letting us know. This is not a bug. We cannot include >>>>>> GeoTools jars due to license issues. But indeed we forgot to update the >>>>>> docs and jupyter notebook examples. I just updated them. Please read them >>>>>> here: >>>>>> >>>>>> >>>>>> https://github.com/apache/incubator-sedona/blob/master/python/ApacheSedonaSQL.ipynb >>>>>> >>>>>> (Make sure you disable the browser cache or open it in an incognito >>>>>> window) >>>>>> http://sedona.apache.org/download/overview/#install-sedona-python >>>>>> >>>>>> In short, you need to add the following coordinates in the notebook: >>>>>> >>>>>> spark = SparkSession. \ builder. \ appName('appName'). \ config( >>>>>> "spark.serializer", KryoSerializer.getName). \ config( >>>>>> "spark.kryo.registrator", SedonaKryoRegistrator.getName). \ config( >>>>>> "spark.jars.repositories", ' >>>>>> https://repo.osgeo.org/repository/release,' ' >>>>>> https://download.java.net/maven/2'). \ config('spark.jars.packages', >>>>>> 'org.apache.sedona:sedona-python-adapter-3.0_2.12:1.0.0-incubating,' >>>>>> 'org.geotools:gt-main:24.0,' 'org.geotools:gt-referencing:24.0,' >>>>>> 'org.geotools:gt-epsg-hsql:24.0'). \ getOrCreate() >>>>>> >>>>>> On Wed, Feb 10, 2021 at 2:35 AM Grégory Dugernier <g...@aloalto.com> >>>>>> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I've been trying to run Sedona for Python on Databricks for 2 days >>>>>>> and I >>>>>>> think I've stumbled upon a bug. >>>>>>> >>>>>>> *Configuration*: >>>>>>> >>>>>>> - Spark 3.0.1 >>>>>>> - Scala 2.12 >>>>>>> - Python 3.7 >>>>>>> >>>>>>> *Librairies*: >>>>>>> >>>>>>> - apache-sedona (from PyPi) >>>>>>> - >>>>>>> org.apache.sedona:sedona-python-adapter-3.0_2.12:1.0.0-incubating >>>>>>> (from Maven) >>>>>>> >>>>>>> *What I'm trying to do:* >>>>>>> >>>>>>> I'm trying to load a series of Shapefiles files into a dataframe for >>>>>>> geospatial analysis. See code snippet below, based of your example >>>>>>> notebook >>>>>>> < >>>>>>> https://github.com/apache/incubator-sedona/blob/master/python/ApacheSedonaCore.ipynb >>>>>>> > >>>>>>> >>>>>>> >>>>>>> > from sedona.core.formatMapper.shapefileParser import >>>>>>> ShapefileReader >>>>>>> > from sedona.register import SedonaRegistrator >>>>>>> > from sedona.utils.adapter import Adapter >>>>>>> > >>>>>>> > SedonaRegistrator.registerAll(spark) >>>>>>> > shape_rdd = ShapefileReader.readToGeometryRDD(spark.sparkContext, >>>>>>> > file_name) >>>>>>> > df = Adapter.toDf(shape_rdd, spark) >>>>>>> > >>>>>>> >>>>>>> *Bug*: >>>>>>> >>>>>>> The ShapefileReader.readToGeometryRDD() currently throws the >>>>>>> following >>>>>>> error: >>>>>>> >>>>>>> > Py4JJavaError: An error occurred while calling >>>>>>> > >>>>>>> z:org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD. >>>>>>> > : java.lang.NoClassDefFoundError: >>>>>>> org/opengis/referencing/FactoryException >>>>>>> > at >>>>>>> > >>>>>>> org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:79) >>>>>>> > at >>>>>>> > >>>>>>> org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:66) >>>>>>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at >>>>>>> > >>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>>>>>> > at >>>>>>> > >>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>>>> > at java.lang.reflect.Method.invoke(Method.java:498) at >>>>>>> > py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at >>>>>>> > py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) >>>>>>> at >>>>>>> > py4j.Gateway.invoke(Gateway.java:295) at >>>>>>> > >>>>>>> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at >>>>>>> > py4j.commands.CallCommand.execute(CallCommand.java:79) at >>>>>>> > py4j.GatewayConnection.run(GatewayConnection.java:251) at >>>>>>> > java.lang.Thread.run(Thread.java:748) Caused by: >>>>>>> > java.lang.ClassNotFoundException: >>>>>>> org.opengis.referencing.FactoryException >>>>>>> > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at >>>>>>> > java.lang.ClassLoader.loadClass(ClassLoader.java:419) at >>>>>>> > >>>>>>> com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151) >>>>>>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:352) >>>>>>> > : java.lang.NoClassDefFoundError: >>>>>>> org/opengis/referencing/FactoryException >>>>>>> > at >>>>>>> > >>>>>>> org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:79) >>>>>>> > at >>>>>>> > >>>>>>> org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:66) >>>>>>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>>> > at >>>>>>> > >>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>>>>>> > at >>>>>>> > >>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>>>> > at java.lang.reflect.Method.invoke(Method.java:498) >>>>>>> > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) >>>>>>> > at >>>>>>> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) >>>>>>> > at py4j.Gateway.invoke(Gateway.java:295) >>>>>>> > at >>>>>>> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) >>>>>>> > at py4j.commands.CallCommand.execute(CallCommand.java:79) >>>>>>> > at py4j.GatewayConnection.run(GatewayConnection.java:251) >>>>>>> > at java.lang.Thread.run(Thread.java:748) >>>>>>> > Caused by: java.lang.ClassNotFoundException: >>>>>>> > org.opengis.referencing.FactoryException >>>>>>> > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) >>>>>>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:419) >>>>>>> > at >>>>>>> > >>>>>>> com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151) >>>>>>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:352) >>>>>>> >>>>>>> >>>>>>> Adding the org.apache.sedona:sedona-core-3.0_2.12:1.0.0-incubating >>>>>>> library >>>>>>> from Maven doesn't solve the error. Adding the >>>>>>> org.datasyslab:geospark:1.3.1 >>>>>>> library from Maven solves the error, but it creates conflicts with >>>>>>> the >>>>>>> underlying org.locationtech.jts dependencies. This makes me think >>>>>>> there is >>>>>>> a missing OpenGIS dependency in the sedona-python-adapter. >>>>>>> >>>>>>> Regards, >>>>>>> G. Dugernier >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> >>>>>>> >>>>>>> Grégory Dugernier >>>>>>> Software Engineer >>>>>>> >>>>>>> g...@aloalto.com <f...@aloalto.com> >>>>>>> +32 (0)484 11 26 09 >>>>>>> >>>>>>> www.aloalto.com >>>>>>> +32 (0)2 736 10 17 >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> DISCLAIMER : The content of this e-mail >>>>>>> message does not constitute a >>>>>>> commitment of S.A. ALOALTO N.V. or its >>>>>>> subsidiaries/affiliates. This e-mail >>>>>>> and any attachments thereto may contain >>>>>>> information which is confidential >>>>>>> and/or protected by intellectual property >>>>>>> rights and are intended for the >>>>>>> intended recipient only. Any use of the >>>>>>> information contained herein >>>>>>> (including, but not limited to, total or partial >>>>>>> reproduction, >>>>>>> communication or distribution in any form) by persons other than >>>>>>> the >>>>>>> designated recipient(s) is prohibited. If an addressing or >>>>>>> transmission >>>>>>> error has misdirected this e-mail, please notify the author, either >>>>>>> by >>>>>>> telephone or by e-mail and delete the material from any computer. >>>>>>> >>>>>>> >>>>> >>>>> -- >>>>> >>>>> >>>>> >>>>> Grégory Dugernier >>>>> Software Engineer >>>>> >>>>> g...@aloalto.com <f...@aloalto.com> >>>>> +32 (0)484 11 26 09 >>>>> >>>>> www.aloalto.com >>>>> +32 (0)2 736 10 17 >>>>> >>>>> DISCLAIMER : The content of this e-mail message does not constitute a >>>>> commitment of S.A. ALOALTO N.V. or its subsidiaries/affiliates. This >>>>> e-mail >>>>> and any attachments thereto may contain information which is confidential >>>>> and/or protected by intellectual property rights and are intended for the >>>>> intended recipient only. Any use of the information contained herein >>>>> (including, but not limited to, total or partial reproduction, >>>>> communication or distribution in any form) by persons other than the >>>>> designated recipient(s) is prohibited. If an addressing or transmission >>>>> error has misdirected this e-mail, please notify the author, either by >>>>> telephone or by e-mail and delete the material from any computer. >>>>> >>>> >>> >>> -- >>> >>> >>> >>> Grégory Dugernier >>> Software Engineer >>> >>> g...@aloalto.com <f...@aloalto.com> >>> +32 (0)484 11 26 09 >>> >>> www.aloalto.com >>> +32 (0)2 736 10 17 >>> >>> DISCLAIMER : The content of this e-mail message does not constitute a >>> commitment of S.A. ALOALTO N.V. or its subsidiaries/affiliates. This e-mail >>> and any attachments thereto may contain information which is confidential >>> and/or protected by intellectual property rights and are intended for the >>> intended recipient only. Any use of the information contained herein >>> (including, but not limited to, total or partial reproduction, >>> communication or distribution in any form) by persons other than the >>> designated recipient(s) is prohibited. If an addressing or transmission >>> error has misdirected this e-mail, please notify the author, either by >>> telephone or by e-mail and delete the material from any computer. >>> >> > > -- > > > > Grégory Dugernier > Software Engineer > > g...@aloalto.com <f...@aloalto.com> > +32 (0)484 11 26 09 > > www.aloalto.com > +32 (0)2 736 10 17 > > DISCLAIMER : The content of this e-mail message does not constitute a > commitment of S.A. ALOALTO N.V. or its subsidiaries/affiliates. This e-mail > and any attachments thereto may contain information which is confidential > and/or protected by intellectual property rights and are intended for the > intended recipient only. Any use of the information contained herein > (including, but not limited to, total or partial reproduction, > communication or distribution in any form) by persons other than the > designated recipient(s) is prohibited. If an addressing or transmission > error has misdirected this e-mail, please notify the author, either by > telephone or by e-mail and delete the material from any computer. >