To continue along the line of thought of Szehon:

I am really excited that the Parquet and Iceberg communities have adopted 
geospatial logical types and of course I am grateful for the work put in that 
direction.

As both Wenchen and Szehon pointed out in their own way, the goal is to have 
minimal support in Spark, as a common platform, for these types.

To be more specific and explicit: The proposal scope is to add support for 
reading/writing to Parquet, based on the new standard, as well as adding the 
types as built-in types in Spark to complement the storage support. The few ST 
expressions that are in the proposal are what seem to be the minimal set of 
expressions needed to support working with geospatial values in the Spark 
engine in a meaningful way.

Best,

Menelaos


> On Mar 29, 2025, at 12:06 PM, Szehon Ho <szehon.apa...@gmail.com> wrote:
> 
> Thank you Menelaos, will do!
> 
> To give a little background, Jia and Sedona community, also GeoParquet 
> community, and others really put much effort contributing to defining the 
> Parquet and Iceberg geo types, which couldn't be done without their 
> experience and help! 
> 
> But I do agree with Wenchen , now that the types are in most common data 
> sources in ecosystem , I think Apache Spark as a common platform needs to 
> have this type definition for inter-op, otherwise users of vanilla Spark 
> cannot work with those data sources with stored geospatial data.  (Imo a 
> similar rationale in adding timestamp nano in the other ongoing SPIP.).  
> 
> And like Wenchen said, the SPIP’s goal doesnt seem to be to fragment the 
> ecosystem by implementing Sedona’s advanced geospatial analytic tech in Spark 
> itself, which you may be right belongs in pluggable frameworks.  Menelaus may 
> explain more about the SPIP goal.
> 
> I do hope there can be more collaboration across communities (like in 
> Iceberg/Parquet collaboration) in getting Sedona community’s experience in 
> making sure these type definitions are optimal , and compatible for Sedona.
> 
> Thanks!
> Szehon
> 
> 
>> On Mar 29, 2025, at 8:04 AM, Menelaos Karavelas 
>> <menelaos.karave...@gmail.com> wrote:
>> 
>> 
>> Hello Szehon,
>> 
>> I just created a Google doc and also linked it in the JIRA:
>> 
>> https://docs.google.com/document/d/1cYSNPGh95OjnpS0k_KDHGM9Ae3j-_0Wnc_eGBZL4D3w/edit?tab=t.0
>> 
>> Please feel free to comment on it.
>> 
>> Best,
>> 
>> Menelaos
>> 
>> 
>>> On Mar 28, 2025, at 2:19 PM, Szehon Ho <szehon.apa...@gmail.com> wrote:
>>> 
>>> Thanks Menelaos, this is exciting !  Is there a google doc we can comment, 
>>> or just on the JIRA?
>>> 
>>> Thanks
>>> Szehon
>>> 
>>> On Fri, Mar 28, 2025 at 1:41 PM Ángel Álvarez Pascua 
>>> <angel.alvarez.pas...@gmail.com <mailto:angel.alvarez.pas...@gmail.com>> 
>>> wrote:
>>>> Sorry, I only had a quick look at the proposal, looked for WKT and didn't 
>>>> find anything.
>>>> 
>>>> It's been years since I worked on geospatial projects and I'm not an 
>>>> expert (at all). Maybe starting with something simple but useful like 
>>>> conversion WKT<=>WKB?  
>>>> 
>>>> 
>>>> El vie, 28 mar 2025, 21:27, Menelaos Karavelas 
>>>> <menelaos.karave...@gmail.com <mailto:menelaos.karave...@gmail.com>> 
>>>> escribió:
>>>>> In the SPIP Jira the proposal is to add the expressions ST_AsBinary, 
>>>>> ST_GeomFromWKB, and ST_GeogFromWKB.
>>>>> Is there anything else that you think should be added?
>>>>> 
>>>>> Regarding WKT, what do you think should be added?
>>>>> 
>>>>> - Menelaos
>>>>> 
>>>>> 
>>>>>> On Mar 28, 2025, at 1:02 PM, Ángel Álvarez Pascua 
>>>>>> <angel.alvarez.pas...@gmail.com <mailto:angel.alvarez.pas...@gmail.com>> 
>>>>>> wrote:
>>>>>> 
>>>>>> What about adding support for WKT 
>>>>>> <https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry>/WKB
>>>>>>  
>>>>>> <https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary>?
>>>>>> 
>>>>>> El vie, 28 mar 2025 a las 20:50, Ángel Álvarez Pascua 
>>>>>> (<angel.alvarez.pas...@gmail.com 
>>>>>> <mailto:angel.alvarez.pas...@gmail.com>>) escribió:
>>>>>>> +1 (non-binding)
>>>>>>> 
>>>>>>> El vie, 28 mar 2025, 18:48, Menelaos Karavelas 
>>>>>>> <menelaos.karave...@gmail.com <mailto:menelaos.karave...@gmail.com>> 
>>>>>>> escribió:
>>>>>>>> Dear Spark community,
>>>>>>>> 
>>>>>>>> I would like to propose the addition of new geospatial data types 
>>>>>>>> (GEOMETRY and GEOGRAPHY) which represent geospatial values as recently 
>>>>>>>> added as new logical types in the Parquet specification.
>>>>>>>> 
>>>>>>>> The new types should improve Spark’s ability to read the new Parquet 
>>>>>>>> logical types and perform some minimal meaningful operations on them.
>>>>>>>> 
>>>>>>>> SPIP: https://issues.apache.org/jira/browse/SPARK-51658
>>>>>>>> 
>>>>>>>> Looking forward to your comments and feedback.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Best regards,
>>>>>>>> 
>>>>>>>> Menelaos Karavelas
>>>>>>>> 
>>>>> 
>> 

Reply via email to