Thanks all for taking the time to respond. Danial, I didn't know that Solr
uses JTS. This is a good finding and we can definitely ask them to see if
there is a work around we can do. Jonathan, I thought of the same idea of
serializing/deserializing a bytearray each time a UDF is called. The
deserialization part is good for letting Pig auto detect spatial types if
not set explicitly in the schema. What is the best way to start this? I
want to add an initial set of JIRA issues and start working on them but I
also need to keep the work grouped in some sense just for organization.

Thanks
Ahmed

Best regards,
Ahmed Eldawy


On Sat, May 4, 2013 at 4:47 PM, Jonathan Coveney <jcove...@gmail.com> wrote:

> I agree that this is cool, and if other projects are using JTS it is worth
> talking them to see how. I also agree that licensing is very frustrating.
>
> In the short term, however, while it is annoying to have to manage the
> serialization and deserialization yourself, you can have the geometry type
> be passed around as a bytearray type. Your UDF's will have to know this and
> treat it accordingly, but if you did this then all of the tools could be in
> an external project on github instead of a branch in Pig. Then, if we can
> get the licensing done, we could add the Geometry type to Pig. Adding
> types, honestly, is kind of tedious but not super difficult, so once the
> rest is done, that shouldn't be too difficult.
>
>
> 2013/5/4 Russell Jurney <russell.jur...@gmail.com>
>
> > If a way could be found, this would be an awesome addition to Pig.
> >
> > Russell Jurney http://datasyndrome.com
> >
> > On May 3, 2013, at 4:09 PM, Daniel Dai <da...@hortonworks.com> wrote:
> >
> > > I am not sure how other Apache projects dealing with it? Seems Solr
> also
> > > has some connector to JTS?
> > >
> > > Thanks,
> > > Daniel
> > >
> > >
> > > On Thu, May 2, 2013 at 11:59 AM, Ahmed Eldawy <aseld...@gmail.com>
> > wrote:
> > >
> > >> Thanks Alan for your interest. It's too bad that an open source
> > licensing
> > >> issue is holding me back from doing some open source work. I
> understand
> > the
> > >> issue and your workarounds make sense. However, as I mentioned in the
> > >> beginning, I don't want to have my own branch of Pig because it makes
> my
> > >> extension less portable. I'll think of another way to do it. I'll ask
> > vivid
> > >> solutions if they can double license their code although I think the
> > answer
> > >> will be no. I'll also think of a way to ship my extension as a set of
> > jar
> > >> files without the need to change the core of Pig. This way, it can be
> > >> easily ported to newer versions of Pig.
> > >>
> > >> Thanks
> > >> Ahmed
> > >>
> > >> Best regards,
> > >> Ahmed Eldawy
> > >>
> > >>
> > >> On Thu, May 2, 2013 at 12:33 PM, Alan Gates <ga...@hortonworks.com>
> > wrote:
> > >>
> > >>> I know this is frustrating, but the different licenses do have
> > different
> > >>> requirements that make it so that Apache can't ship GPL code.  A
> legal
> > >>> explanation is at
> > http://www.apache.org/licenses/GPL-compatibility.htmlFor additional info
> > on the LGPL specific questions see
> > >>> http://www.apache.org/legal/3party.html
> > >>>
> > >>> As far as pulling it in via ivy, the issue isn't so much where the
> code
> > >>> lives as much as what code we are requiring to make Pig work.  If
> > >> something
> > >>> that is [L]GPL is required for Pig it violates Apache rules as
> outlined
> > >>> above.  It also would be a show stopper for a lot of companies that
> > >>> redistribute Pig and that are allergic to GPL software.
> > >>>
> > >>> So, as I said before, if you wanted to continue with that library and
> > >> they
> > >>> are not willing to relicense it then it would have to be bolted on
> > after
> > >>> Apache Pig is built.  Nothing stops you from doing this by
> downloading
> > >>> Apache Pig, adding this library and your code, and redistributing,
> > though
> > >>> it wouldn't then be open to all Pig users.
> > >>>
> > >>> Alan.
> > >>>
> > >>> On May 1, 2013, at 6:08 PM, Ahmed Eldawy wrote:
> > >>>
> > >>>> Thanks for your response. I was never good at differentiating all
> > those
> > >>>> open source licenses. I mean what is the point making open source
> > >>> licenses
> > >>>> if it blocks me from using a library in an open source project. Any
> > >> way,
> > >>>> I'm not going into debate here. Just one question, if we use JTS as
> a
> > >>>> library (jar file) without adding the code in Pig, is it still a
> > >>> violation?
> > >>>> We'll use ivy, for example, to download the jar file when compiling.
> > >>>> On May 1, 2013 7:50 PM, "Alan Gates" <ga...@hortonworks.com> wrote:
> > >>>>
> > >>>>> Passing on the technical details for a moment, I see a licensing
> > >> issue.
> > >>>>> JTS is licensed under LGPL.  Apache projects cannot contain or ship
> > >>>>> [L]GPL.  Apache does not meet the requirements of GPL and thus we
> > >> cannot
> > >>>>> repackage their code. If you wanted to go forward using that class
> > >> this
> > >>>>> would have to be packaged as an add on that was downloaded
> separately
> > >>> and
> > >>>>> not from Apache.  Another option is to work with the JTS community
> > and
> > >>> see
> > >>>>> if they are willing to dual license their code under BSD or Apache
> > >>> license
> > >>>>> so that Pig could include it.  If neither of those are an option
> you
> > >>> would
> > >>>>> need to come up with a new class to contain your spatial data.
> > >>>>>
> > >>>>> Alan.
> > >>>>>
> > >>>>> On May 1, 2013, at 5:40 PM, Ahmed Eldawy wrote:
> > >>>>>
> > >>>>>> Hi all,
> > >>>>>> First, sorry for the long email. I wanted to put all my thoughts
> > here
> > >>>>> and
> > >>>>>> get your feedback.
> > >>>>>> I'm proposing a major addition to Pig that will greatly increase
> its
> > >>>>>> functionality and user base. It is simply to add spatial support
> to
> > >> the
> > >>>>>> language and the framework. I've already started working on that
> but
> > >> I
> > >>>>>> don't want it to be just another branch. I want it, eventually, to
> > be
> > >>>>>> merged with the trunk of Apache Pig. So, I'm sending this email
> > >> mainly
> > >>> to
> > >>>>>> reach out the main contributors of Pig to see the feasibility of
> > >> this.
> > >>>>>> This addition is a part of a big project we have been working on
> in
> > >>>>>> University of Minnesota; the project is called Spatial Hadoop.
> > >>>>>> http://spatialhadoop.cs.umn.edu. It's about building a MapReduce
> > >>>>> framework
> > >>>>>> (Hadoop) that is capable of maintaining and analyzing spatial data
> > >>>>>> efficiently. I'm the main guy behind that project and since we
> > >> released
> > >>>>> its
> > >>>>>> first version, we received very encouraging responses from
> different
> > >>>>> groups
> > >>>>>> in the research and industrial community. I'm sure the addition we
> > >> want
> > >>>>> to
> > >>>>>> make to Pig Latin will be widely accepted by the people in the
> > >> spatial
> > >>>>>> community.
> > >>>>>> I'm proposing a plan here while we're still in the early phases of
> > >> this
> > >>>>>> task to be able to discuss it with the main contributors and see
> its
> > >>>>>> feasibility. First of all, I think that we need to change the core
> > of
> > >>> Pig
> > >>>>>> to be able to support spatial data. Providing a set of UDFs only
> is
> > >> not
> > >>>>>> enough. The main reason is that Pig Latin does not provide a way
> to
> > >>>>> create
> > >>>>>> a new data type which is needed for spatial data. Once we have the
> > >>>>> spatial
> > >>>>>> data types we need, the functionality can be expanded using more
> > >> UDFs.
> > >>>>>>
> > >>>>>> Here's the plan as I see it.
> > >>>>>> 1- Introduce a new primitive data type Geometry which represents
> all
> > >>>>>> spatial data types. In the underlying system, this will map to
> > >>>>>> com.vividsolutions.jts.geom.Geometry. This is a class from Java
> > >>> Topology
> > >>>>>> Suite (JTS) [http://www.vividsolutions.com/jts/JTSHome.htm], a
> > >> stable
> > >>>>> and
> > >>>>>> efficient open source Java library for spatial data types and
> > >>> algorithms.
> > >>>>>> It is very popular in the spatial community and a C++ port of it
> is
> > >>> used
> > >>>>> in
> > >>>>>> PostGIS [http://postgis.net/] (a spatial library for Postgres).
> JTS
> > >>> also
> > >>>>>> conforms with Open Geospatial Consortium (OGC) [
> > >>>>>> http://www.opengeospatial.org/] which is an open standard for the
> > >>>>> spatial
> > >>>>>> data types. The Geometry data type is read from and written to
> text
> > >>> files
> > >>>>>> using the Well Known Text (WKT) format. There is also a way to
> > >> convert
> > >>> it
> > >>>>>> to/from binary so that it can work with binary files and streams.
> > >>>>>> 2- Add functions that manipulate spatial data types. These will be
> > >>> added
> > >>>>> as
> > >>>>>> UDFs and we will not need to mess with the internals of Pig. Most
> > >>>>> probably,
> > >>>>>> there will be one new class for each operation (e.g., union or
> > >>>>>> intersection). I think it will be good to put these new operations
> > >>> inside
> > >>>>>> the core of Pig so that users can use it without having to write
> the
> > >>>>> fully
> > >>>>>> qualified class name. Also, since there is no way to implicitly
> cast
> > >> a
> > >>>>>> spatial data type to a non-spatial data types, there will not be
> any
> > >>>>>> conflicts in existing operations or new operations. All new
> > >> operations,
> > >>>>> and
> > >>>>>> only the new operations, will be working on spatial data types.
> Here
> > >> is
> > >>>>> an
> > >>>>>> initial list of operations that can be added. All those operations
> > >> are
> > >>>>>> already implemented in JTS and the UDFs added to Pig will be just
> > >>>>> wrappers
> > >>>>>> around them.
> > >>>>>> **Predicates (used for spatial filtering)
> > >>>>>> Equals
> > >>>>>> Disjoint
> > >>>>>> Intersects
> > >>>>>> Touches
> > >>>>>> Crosses
> > >>>>>> Within
> > >>>>>> Contains
> > >>>>>> Overlaps
> > >>>>>>
> > >>>>>> **Operations
> > >>>>>> Envelope
> > >>>>>> Area
> > >>>>>> Length
> > >>>>>> Buffer
> > >>>>>> ConvexHull
> > >>>>>> Intersection
> > >>>>>> Union
> > >>>>>> Difference
> > >>>>>> SymDifference
> > >>>>>>
> > >>>>>> **Aggregate functions
> > >>>>>> Accum
> > >>>>>> ConvexHull
> > >>>>>> Union
> > >>>>>>
> > >>>>>> 3- The third step is to implement spatial indexes (e.g., Grid or
> > >>>>> R-tree). A
> > >>>>>> Pig loader and Pig output classes will be created for those
> indexes.
> > >>> Note
> > >>>>>> that currently we have SpatialOutputFormat and SpatialInputFormat
> > for
> > >>>>> those
> > >>>>>> indexes inside the Spatial Hadoop project, but we need to tweak
> them
> > >> to
> > >>>>>> work with Pig.
> > >>>>>>
> > >>>>>> 4- (Advanced) Implement more sophisticated algorithms for spatial
> > >>>>>> operations that utilize the indexes. For example, we can have a
> > >>> specific
> > >>>>>> algorithm for spatial range query or spatial join. Again, we
> already
> > >>> have
> > >>>>>> algorithms built for different operations implemented in Spatial
> > >> Hadoop
> > >>>>> as
> > >>>>>> MapReduce programs, but they will need to be modified to work in
> Pig
> > >>>>>> environment and get to work with other operations.
> > >>>>>>
> > >>>>>> This is my whole plan for the spatial extension to Pig. I've
> already
> > >>>>>> started with the first step but as I mentioned earlier, I don't
> want
> > >> to
> > >>>>> do
> > >>>>>> the work for our project and then the work gets forgotten. I want
> to
> > >>>>>> contribute to Pig and do my research at the same time. If you
> think
> > >> the
> > >>>>>> plan is plausible, I'll open JIRA issues for the above tasks and
> > >> start
> > >>>>>> shipping patches to do the stuff. I'll conform with the standards
> of
> > >>> the
> > >>>>>> project such as adding tests and well commenting the code.
> > >>>>>> Sorry for the long email and hope to hear back from you.
> > >>>>>>
> > >>>>>>
> > >>>>>> Best regards,
> > >>>>>> Ahmed Eldawy
> > >>>>>
> > >>>>>
> > >>>
> > >>>
> > >>
> >
>

Reply via email to