Thanks for letting us know. Yes, our source code is not supposed to be
compiled on Windows. I didn't expect so much trouble to get this jar. We
will figure a better way to solve this issue soon.

On Thu, Feb 11, 2021 at 1:46 AM Grégory Dugernier <g...@aloalto.com> wrote:

> In fact, you should let us know about your situation early on. In fact,
>> you can download the GeoTools jars manually and copy to SPARK_HOME/jars/
>> folder... You don't have to compile the code. Download links are given in
>> the comments:
>> http://sedona.apache.org/download/GeoSpark-All-Modules-Maven-Central-Coordinates/#geotools-240
>
>
> I did copy the Geotools jars and added them to my cluster library, but
> python-adapter didn't seem to find them in the FileStore. Placing the jars
> inside SPARK_HOME on the cluster means trying to first determine where the
> environment variable points to inside the DBFS architecture, then most
> likely add them through CLI commands. This represented several short terms
> obstacles, but also raised many issues down the line, because we are
> deploying our clusters through Terraform and not all developers will have
> the elevated permissions to perform CLI commands. A single, compiled jar
> with all the dependencies within can easily be deployed at cluster creation
> with a databricks_dbfs_file
> <https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/resources/dbfs_file>
> resource and using the library.jar property of databricks_cluster
> <https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/resources/cluster#library-configuration-block>.
> The jar ended up to be a bit of a headache to produce, but it keeps things
> high level and easier to maintain.
>
> That is, of course, unless I'm missing the obvious and there was an easy
> way to add GeoTools jars on the Databricks cluster and let
> sedona-python-adapter find them, which isn't entirely excluded.
>
> On Thu, 11 Feb 2021 at 10:03, Jia Yu <ji...@apache.org> wrote:
>
>> Thanks, Gregory. I think this behavior is not expected. We will look into
>> this.
>>
>> In fact, you should let us know about your situation early on. In fact,
>> you can download the GeoTools jars manually and copy to SPARK_HOME/jars/
>> folder... You don't have to compile the code. Download links are given in
>> the comments:
>> http://sedona.apache.org/download/GeoSpark-All-Modules-Maven-Central-Coordinates/#geotools-240
>>
>> We should make our doc more clear.
>>
>>
>> On Thu, Feb 11, 2021 at 12:44 AM Grégory Dugernier <g...@aloalto.com>
>> wrote:
>>
>>> Hi Jia,
>>>
>>> After much sweat and tears, I went the long road and compiled the code
>>> locally. I'm working on Windows so I had to change a few things in the
>>> POM.xml:
>>>
>>>    - When trying to compile just the python-adapter lib, Maven didn't
>>>    like the dynamic versioning of sedona-core and sedona-sql, so I had to
>>>    hardcode the current version.
>>>    - For some reason, Maven couldn't find spark-version-converter from
>>>    within the python-adapter directory, so I just decided to compile the 
>>> full
>>>    library. It might be possible to just compile the adapter, I just decided
>>>    pushing in this direction further seemed like it would take longer.
>>>    - When trying to compile the full library, the attach-javadoc goal
>>>    just keep erroring-out, even with the latest version of
>>>    maven-javadoc-plugin, so I just removed it entirely.
>>>
>>> By the end, I got the jar, uploaded it in Databricks and it works like a
>>> charm so far.
>>>
>>> I did however meet another issue, it seems that when using 
>>> *ShapefileReader.readToGeometryRDD(spark.sparkContext,
>>> file_url) *to read multiple Shapefiles files at once, then use the
>>> Adapter, same-named columns aren't combined in the resulting DataFrame (see
>>> example below). It might be normal RDD behavior -I have little experience
>>> using them instead of DataFrames-, and I already found a workaround by
>>> creating multiple dfs and using union(), but I prefer to let you know in
>>> case it isn't the expected behavior.
>>> [image: image.png]
>>>
>>> Regards,
>>> Grégory
>>>
>>> On Thu, 11 Feb 2021 at 07:58, Jia Yu <ji...@apache.org> wrote:
>>>
>>>> Hi Gregory,
>>>>
>>>> Please let us know if you get your issue fixed. I know many of our
>>>> users are also using Databricks cluster. We are also interested in the
>>>> solution.
>>>>
>>>> Thanks,
>>>> Jia
>>>>
>>>> On Wed, Feb 10, 2021 at 5:17 AM Grégory Dugernier <g...@aloalto.com>
>>>> wrote:
>>>>
>>>>> Thank you for the quick reply!
>>>>>
>>>>> It seems my particular situation is a bit more complex than that,
>>>>> since I'm running the notebook on a Databricks cluster, and the default
>>>>> spark config doesn't seem to allow for more jar repositories (GeoTools
>>>>> isn't on Maven Central), nor does creating a new SparkSession appears to
>>>>> work. I've tried to download the jars and add them manually to the cluster
>>>>> but it doesn't seem to work either. But at least I know where the issue's
>>>>> at!
>>>>>
>>>>> Thanks again for your help,
>>>>> Regards
>>>>>
>>>>> On Wed, 10 Feb 2021 at 12:22, Jia Yu <ji...@apache.org> wrote:
>>>>>
>>>>>> Hi Gregory,
>>>>>>
>>>>>> Thanks for letting us know. This is not a bug. We cannot include
>>>>>> GeoTools jars due to license issues. But indeed we forgot to update the
>>>>>> docs and jupyter notebook examples. I just updated them. Please read them
>>>>>> here:
>>>>>>
>>>>>>
>>>>>> https://github.com/apache/incubator-sedona/blob/master/python/ApacheSedonaSQL.ipynb
>>>>>>
>>>>>> (Make sure you disable the browser cache or open it in an incognito
>>>>>> window)
>>>>>> http://sedona.apache.org/download/overview/#install-sedona-python
>>>>>>
>>>>>> In short, you need to add the following coordinates in the notebook:
>>>>>>
>>>>>> spark = SparkSession. \ builder. \ appName('appName'). \ config(
>>>>>> "spark.serializer", KryoSerializer.getName). \ config(
>>>>>> "spark.kryo.registrator", SedonaKryoRegistrator.getName). \ config(
>>>>>> "spark.jars.repositories", '
>>>>>> https://repo.osgeo.org/repository/release,' '
>>>>>> https://download.java.net/maven/2'). \ config('spark.jars.packages',
>>>>>> 'org.apache.sedona:sedona-python-adapter-3.0_2.12:1.0.0-incubating,'
>>>>>> 'org.geotools:gt-main:24.0,' 'org.geotools:gt-referencing:24.0,'
>>>>>> 'org.geotools:gt-epsg-hsql:24.0'). \ getOrCreate()
>>>>>>
>>>>>> On Wed, Feb 10, 2021 at 2:35 AM Grégory Dugernier <g...@aloalto.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I've been trying to run Sedona for Python on Databricks for 2 days
>>>>>>> and I
>>>>>>> think I've stumbled upon a bug.
>>>>>>>
>>>>>>> *Configuration*:
>>>>>>>
>>>>>>>    - Spark 3.0.1
>>>>>>>    - Scala 2.12
>>>>>>>    - Python 3.7
>>>>>>>
>>>>>>> *Librairies*:
>>>>>>>
>>>>>>>    - apache-sedona (from PyPi)
>>>>>>>    -
>>>>>>> org.apache.sedona:sedona-python-adapter-3.0_2.12:1.0.0-incubating
>>>>>>>    (from Maven)
>>>>>>>
>>>>>>> *What I'm trying to do:*
>>>>>>>
>>>>>>> I'm trying to load a series of Shapefiles files into a dataframe for
>>>>>>> geospatial analysis. See code snippet below, based of your example
>>>>>>> notebook
>>>>>>> <
>>>>>>> https://github.com/apache/incubator-sedona/blob/master/python/ApacheSedonaCore.ipynb
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>> > from sedona.core.formatMapper.shapefileParser import
>>>>>>> ShapefileReader
>>>>>>> > from sedona.register import SedonaRegistrator
>>>>>>> > from sedona.utils.adapter import Adapter
>>>>>>> >
>>>>>>> > SedonaRegistrator.registerAll(spark)
>>>>>>> > shape_rdd = ShapefileReader.readToGeometryRDD(spark.sparkContext,
>>>>>>> > file_name)
>>>>>>> > df = Adapter.toDf(shape_rdd, spark)
>>>>>>> >
>>>>>>>
>>>>>>> *Bug*:
>>>>>>>
>>>>>>> The ShapefileReader.readToGeometryRDD() currently throws the
>>>>>>> following
>>>>>>> error:
>>>>>>>
>>>>>>> > Py4JJavaError: An error occurred while calling
>>>>>>> >
>>>>>>> z:org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD.
>>>>>>> > : java.lang.NoClassDefFoundError:
>>>>>>> org/opengis/referencing/FactoryException
>>>>>>> > at
>>>>>>> >
>>>>>>> org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:79)
>>>>>>> > at
>>>>>>> >
>>>>>>> org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:66)
>>>>>>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
>>>>>>> >
>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>>> > at
>>>>>>> >
>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>> > at java.lang.reflect.Method.invoke(Method.java:498) at
>>>>>>> > py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at
>>>>>>> > py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
>>>>>>> at
>>>>>>> > py4j.Gateway.invoke(Gateway.java:295) at
>>>>>>> >
>>>>>>> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at
>>>>>>> > py4j.commands.CallCommand.execute(CallCommand.java:79) at
>>>>>>> > py4j.GatewayConnection.run(GatewayConnection.java:251) at
>>>>>>> > java.lang.Thread.run(Thread.java:748) Caused by:
>>>>>>> > java.lang.ClassNotFoundException:
>>>>>>> org.opengis.referencing.FactoryException
>>>>>>> > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at
>>>>>>> > java.lang.ClassLoader.loadClass(ClassLoader.java:419) at
>>>>>>> >
>>>>>>> com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151)
>>>>>>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
>>>>>>> > : java.lang.NoClassDefFoundError:
>>>>>>> org/opengis/referencing/FactoryException
>>>>>>> > at
>>>>>>> >
>>>>>>> org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:79)
>>>>>>> > at
>>>>>>> >
>>>>>>> org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader.readToGeometryRDD(ShapefileReader.java:66)
>>>>>>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>> > at
>>>>>>> >
>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>>> > at
>>>>>>> >
>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>> > at java.lang.reflect.Method.invoke(Method.java:498)
>>>>>>> > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>>>>>>> > at
>>>>>>> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
>>>>>>> > at py4j.Gateway.invoke(Gateway.java:295)
>>>>>>> > at
>>>>>>> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>>>>>>> > at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>>>>>> > at py4j.GatewayConnection.run(GatewayConnection.java:251)
>>>>>>> > at java.lang.Thread.run(Thread.java:748)
>>>>>>> > Caused by: java.lang.ClassNotFoundException:
>>>>>>> > org.opengis.referencing.FactoryException
>>>>>>> > at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>>>>>>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
>>>>>>> > at
>>>>>>> >
>>>>>>> com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151)
>>>>>>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
>>>>>>>
>>>>>>>
>>>>>>> Adding the org.apache.sedona:sedona-core-3.0_2.12:1.0.0-incubating
>>>>>>> library
>>>>>>> from Maven doesn't solve the error. Adding the
>>>>>>> org.datasyslab:geospark:1.3.1
>>>>>>> library from Maven solves the error, but it creates conflicts with
>>>>>>> the
>>>>>>> underlying org.locationtech.jts dependencies. This makes me think
>>>>>>> there is
>>>>>>> a missing OpenGIS dependency in the sedona-python-adapter.
>>>>>>>
>>>>>>> Regards,
>>>>>>> G. Dugernier
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Grégory Dugernier
>>>>>>> Software Engineer
>>>>>>>
>>>>>>> g...@aloalto.com <f...@aloalto.com>
>>>>>>> +32 (0)484 11 26 09
>>>>>>>
>>>>>>> www.aloalto.com
>>>>>>> +32 (0)2 736 10 17
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> DISCLAIMER : The content of this e-mail
>>>>>>> message does not constitute a
>>>>>>> commitment of S.A. ALOALTO N.V. or its
>>>>>>> subsidiaries/affiliates. This e-mail
>>>>>>> and any attachments thereto may contain
>>>>>>> information which is confidential
>>>>>>> and/or protected by intellectual property
>>>>>>> rights and are intended for the
>>>>>>> intended recipient only. Any use of the
>>>>>>> information contained herein
>>>>>>> (including, but not limited to, total or partial
>>>>>>> reproduction,
>>>>>>> communication or distribution in any form) by persons other than
>>>>>>> the
>>>>>>> designated recipient(s) is prohibited. If an addressing or
>>>>>>> transmission
>>>>>>> error has misdirected this e-mail, please notify the author, either
>>>>>>> by
>>>>>>> telephone or by e-mail and delete the material from any computer.
>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>> Grégory Dugernier
>>>>> Software Engineer
>>>>>
>>>>> g...@aloalto.com <f...@aloalto.com>
>>>>> +32 (0)484 11 26 09
>>>>>
>>>>> www.aloalto.com
>>>>> +32 (0)2 736 10 17
>>>>>
>>>>> DISCLAIMER : The content of this e-mail message does not constitute a
>>>>> commitment of S.A. ALOALTO N.V. or its subsidiaries/affiliates. This 
>>>>> e-mail
>>>>> and any attachments thereto may contain information which is confidential
>>>>> and/or protected by intellectual property rights and are intended for the
>>>>> intended recipient only. Any use of the information contained herein
>>>>> (including, but not limited to, total or partial reproduction,
>>>>> communication or distribution in any form) by persons other than the
>>>>> designated recipient(s) is prohibited. If an addressing or transmission
>>>>> error has misdirected this e-mail, please notify the author, either by
>>>>> telephone or by e-mail and delete the material from any computer.
>>>>>
>>>>
>>>
>>> --
>>>
>>>
>>>
>>> Grégory Dugernier
>>> Software Engineer
>>>
>>> g...@aloalto.com <f...@aloalto.com>
>>> +32 (0)484 11 26 09
>>>
>>> www.aloalto.com
>>> +32 (0)2 736 10 17
>>>
>>> DISCLAIMER : The content of this e-mail message does not constitute a
>>> commitment of S.A. ALOALTO N.V. or its subsidiaries/affiliates. This e-mail
>>> and any attachments thereto may contain information which is confidential
>>> and/or protected by intellectual property rights and are intended for the
>>> intended recipient only. Any use of the information contained herein
>>> (including, but not limited to, total or partial reproduction,
>>> communication or distribution in any form) by persons other than the
>>> designated recipient(s) is prohibited. If an addressing or transmission
>>> error has misdirected this e-mail, please notify the author, either by
>>> telephone or by e-mail and delete the material from any computer.
>>>
>>
>
> --
>
>
>
> Grégory Dugernier
> Software Engineer
>
> g...@aloalto.com <f...@aloalto.com>
> +32 (0)484 11 26 09
>
> www.aloalto.com
> +32 (0)2 736 10 17
>
> DISCLAIMER : The content of this e-mail message does not constitute a
> commitment of S.A. ALOALTO N.V. or its subsidiaries/affiliates. This e-mail
> and any attachments thereto may contain information which is confidential
> and/or protected by intellectual property rights and are intended for the
> intended recipient only. Any use of the information contained herein
> (including, but not limited to, total or partial reproduction,
> communication or distribution in any form) by persons other than the
> designated recipient(s) is prohibited. If an addressing or transmission
> error has misdirected this e-mail, please notify the author, either by
> telephone or by e-mail and delete the material from any computer.
>

Reply via email to