You can give them all the same label or tag and filter on that later on.
2013/5/6 Ahmed Eldawy <aseld...@gmail.com> > Thanks all for taking the time to respond. Danial, I didn't know that Solr > uses JTS. This is a good finding and we can definitely ask them to see if > there is a work around we can do. Jonathan, I thought of the same idea of > serializing/deserializing a bytearray each time a UDF is called. The > deserialization part is good for letting Pig auto detect spatial types if > not set explicitly in the schema. What is the best way to start this? I > want to add an initial set of JIRA issues and start working on them but I > also need to keep the work grouped in some sense just for organization. > > Thanks > Ahmed > > Best regards, > Ahmed Eldawy > > > On Sat, May 4, 2013 at 4:47 PM, Jonathan Coveney <jcove...@gmail.com> > wrote: > > > I agree that this is cool, and if other projects are using JTS it is > worth > > talking them to see how. I also agree that licensing is very frustrating. > > > > In the short term, however, while it is annoying to have to manage the > > serialization and deserialization yourself, you can have the geometry > type > > be passed around as a bytearray type. Your UDF's will have to know this > and > > treat it accordingly, but if you did this then all of the tools could be > in > > an external project on github instead of a branch in Pig. Then, if we can > > get the licensing done, we could add the Geometry type to Pig. Adding > > types, honestly, is kind of tedious but not super difficult, so once the > > rest is done, that shouldn't be too difficult. > > > > > > 2013/5/4 Russell Jurney <russell.jur...@gmail.com> > > > > > If a way could be found, this would be an awesome addition to Pig. > > > > > > Russell Jurney http://datasyndrome.com > > > > > > On May 3, 2013, at 4:09 PM, Daniel Dai <da...@hortonworks.com> wrote: > > > > > > > I am not sure how other Apache projects dealing with it? Seems Solr > > also > > > > has some connector to JTS? > > > > > > > > Thanks, > > > > Daniel > > > > > > > > > > > > On Thu, May 2, 2013 at 11:59 AM, Ahmed Eldawy <aseld...@gmail.com> > > > wrote: > > > > > > > >> Thanks Alan for your interest. It's too bad that an open source > > > licensing > > > >> issue is holding me back from doing some open source work. I > > understand > > > the > > > >> issue and your workarounds make sense. However, as I mentioned in > the > > > >> beginning, I don't want to have my own branch of Pig because it > makes > > my > > > >> extension less portable. I'll think of another way to do it. I'll > ask > > > vivid > > > >> solutions if they can double license their code although I think the > > > answer > > > >> will be no. I'll also think of a way to ship my extension as a set > of > > > jar > > > >> files without the need to change the core of Pig. This way, it can > be > > > >> easily ported to newer versions of Pig. > > > >> > > > >> Thanks > > > >> Ahmed > > > >> > > > >> Best regards, > > > >> Ahmed Eldawy > > > >> > > > >> > > > >> On Thu, May 2, 2013 at 12:33 PM, Alan Gates <ga...@hortonworks.com> > > > wrote: > > > >> > > > >>> I know this is frustrating, but the different licenses do have > > > different > > > >>> requirements that make it so that Apache can't ship GPL code. A > > legal > > > >>> explanation is at > > > http://www.apache.org/licenses/GPL-compatibility.htmlFor additional > info > > > on the LGPL specific questions see > > > >>> http://www.apache.org/legal/3party.html > > > >>> > > > >>> As far as pulling it in via ivy, the issue isn't so much where the > > code > > > >>> lives as much as what code we are requiring to make Pig work. If > > > >> something > > > >>> that is [L]GPL is required for Pig it violates Apache rules as > > outlined > > > >>> above. It also would be a show stopper for a lot of companies that > > > >>> redistribute Pig and that are allergic to GPL software. > > > >>> > > > >>> So, as I said before, if you wanted to continue with that library > and > > > >> they > > > >>> are not willing to relicense it then it would have to be bolted on > > > after > > > >>> Apache Pig is built. Nothing stops you from doing this by > > downloading > > > >>> Apache Pig, adding this library and your code, and redistributing, > > > though > > > >>> it wouldn't then be open to all Pig users. > > > >>> > > > >>> Alan. > > > >>> > > > >>> On May 1, 2013, at 6:08 PM, Ahmed Eldawy wrote: > > > >>> > > > >>>> Thanks for your response. I was never good at differentiating all > > > those > > > >>>> open source licenses. I mean what is the point making open source > > > >>> licenses > > > >>>> if it blocks me from using a library in an open source project. > Any > > > >> way, > > > >>>> I'm not going into debate here. Just one question, if we use JTS > as > > a > > > >>>> library (jar file) without adding the code in Pig, is it still a > > > >>> violation? > > > >>>> We'll use ivy, for example, to download the jar file when > compiling. > > > >>>> On May 1, 2013 7:50 PM, "Alan Gates" <ga...@hortonworks.com> > wrote: > > > >>>> > > > >>>>> Passing on the technical details for a moment, I see a licensing > > > >> issue. > > > >>>>> JTS is licensed under LGPL. Apache projects cannot contain or > ship > > > >>>>> [L]GPL. Apache does not meet the requirements of GPL and thus we > > > >> cannot > > > >>>>> repackage their code. If you wanted to go forward using that > class > > > >> this > > > >>>>> would have to be packaged as an add on that was downloaded > > separately > > > >>> and > > > >>>>> not from Apache. Another option is to work with the JTS > community > > > and > > > >>> see > > > >>>>> if they are willing to dual license their code under BSD or > Apache > > > >>> license > > > >>>>> so that Pig could include it. If neither of those are an option > > you > > > >>> would > > > >>>>> need to come up with a new class to contain your spatial data. > > > >>>>> > > > >>>>> Alan. > > > >>>>> > > > >>>>> On May 1, 2013, at 5:40 PM, Ahmed Eldawy wrote: > > > >>>>> > > > >>>>>> Hi all, > > > >>>>>> First, sorry for the long email. I wanted to put all my thoughts > > > here > > > >>>>> and > > > >>>>>> get your feedback. > > > >>>>>> I'm proposing a major addition to Pig that will greatly increase > > its > > > >>>>>> functionality and user base. It is simply to add spatial support > > to > > > >> the > > > >>>>>> language and the framework. I've already started working on that > > but > > > >> I > > > >>>>>> don't want it to be just another branch. I want it, eventually, > to > > > be > > > >>>>>> merged with the trunk of Apache Pig. So, I'm sending this email > > > >> mainly > > > >>> to > > > >>>>>> reach out the main contributors of Pig to see the feasibility of > > > >> this. > > > >>>>>> This addition is a part of a big project we have been working on > > in > > > >>>>>> University of Minnesota; the project is called Spatial Hadoop. > > > >>>>>> http://spatialhadoop.cs.umn.edu. It's about building a > MapReduce > > > >>>>> framework > > > >>>>>> (Hadoop) that is capable of maintaining and analyzing spatial > data > > > >>>>>> efficiently. I'm the main guy behind that project and since we > > > >> released > > > >>>>> its > > > >>>>>> first version, we received very encouraging responses from > > different > > > >>>>> groups > > > >>>>>> in the research and industrial community. I'm sure the addition > we > > > >> want > > > >>>>> to > > > >>>>>> make to Pig Latin will be widely accepted by the people in the > > > >> spatial > > > >>>>>> community. > > > >>>>>> I'm proposing a plan here while we're still in the early phases > of > > > >> this > > > >>>>>> task to be able to discuss it with the main contributors and see > > its > > > >>>>>> feasibility. First of all, I think that we need to change the > core > > > of > > > >>> Pig > > > >>>>>> to be able to support spatial data. Providing a set of UDFs only > > is > > > >> not > > > >>>>>> enough. The main reason is that Pig Latin does not provide a way > > to > > > >>>>> create > > > >>>>>> a new data type which is needed for spatial data. Once we have > the > > > >>>>> spatial > > > >>>>>> data types we need, the functionality can be expanded using more > > > >> UDFs. > > > >>>>>> > > > >>>>>> Here's the plan as I see it. > > > >>>>>> 1- Introduce a new primitive data type Geometry which represents > > all > > > >>>>>> spatial data types. In the underlying system, this will map to > > > >>>>>> com.vividsolutions.jts.geom.Geometry. This is a class from Java > > > >>> Topology > > > >>>>>> Suite (JTS) [http://www.vividsolutions.com/jts/JTSHome.htm], a > > > >> stable > > > >>>>> and > > > >>>>>> efficient open source Java library for spatial data types and > > > >>> algorithms. > > > >>>>>> It is very popular in the spatial community and a C++ port of it > > is > > > >>> used > > > >>>>> in > > > >>>>>> PostGIS [http://postgis.net/] (a spatial library for Postgres). > > JTS > > > >>> also > > > >>>>>> conforms with Open Geospatial Consortium (OGC) [ > > > >>>>>> http://www.opengeospatial.org/] which is an open standard for > the > > > >>>>> spatial > > > >>>>>> data types. The Geometry data type is read from and written to > > text > > > >>> files > > > >>>>>> using the Well Known Text (WKT) format. There is also a way to > > > >> convert > > > >>> it > > > >>>>>> to/from binary so that it can work with binary files and > streams. > > > >>>>>> 2- Add functions that manipulate spatial data types. These will > be > > > >>> added > > > >>>>> as > > > >>>>>> UDFs and we will not need to mess with the internals of Pig. > Most > > > >>>>> probably, > > > >>>>>> there will be one new class for each operation (e.g., union or > > > >>>>>> intersection). I think it will be good to put these new > operations > > > >>> inside > > > >>>>>> the core of Pig so that users can use it without having to write > > the > > > >>>>> fully > > > >>>>>> qualified class name. Also, since there is no way to implicitly > > cast > > > >> a > > > >>>>>> spatial data type to a non-spatial data types, there will not be > > any > > > >>>>>> conflicts in existing operations or new operations. All new > > > >> operations, > > > >>>>> and > > > >>>>>> only the new operations, will be working on spatial data types. > > Here > > > >> is > > > >>>>> an > > > >>>>>> initial list of operations that can be added. All those > operations > > > >> are > > > >>>>>> already implemented in JTS and the UDFs added to Pig will be > just > > > >>>>> wrappers > > > >>>>>> around them. > > > >>>>>> **Predicates (used for spatial filtering) > > > >>>>>> Equals > > > >>>>>> Disjoint > > > >>>>>> Intersects > > > >>>>>> Touches > > > >>>>>> Crosses > > > >>>>>> Within > > > >>>>>> Contains > > > >>>>>> Overlaps > > > >>>>>> > > > >>>>>> **Operations > > > >>>>>> Envelope > > > >>>>>> Area > > > >>>>>> Length > > > >>>>>> Buffer > > > >>>>>> ConvexHull > > > >>>>>> Intersection > > > >>>>>> Union > > > >>>>>> Difference > > > >>>>>> SymDifference > > > >>>>>> > > > >>>>>> **Aggregate functions > > > >>>>>> Accum > > > >>>>>> ConvexHull > > > >>>>>> Union > > > >>>>>> > > > >>>>>> 3- The third step is to implement spatial indexes (e.g., Grid or > > > >>>>> R-tree). A > > > >>>>>> Pig loader and Pig output classes will be created for those > > indexes. > > > >>> Note > > > >>>>>> that currently we have SpatialOutputFormat and > SpatialInputFormat > > > for > > > >>>>> those > > > >>>>>> indexes inside the Spatial Hadoop project, but we need to tweak > > them > > > >> to > > > >>>>>> work with Pig. > > > >>>>>> > > > >>>>>> 4- (Advanced) Implement more sophisticated algorithms for > spatial > > > >>>>>> operations that utilize the indexes. For example, we can have a > > > >>> specific > > > >>>>>> algorithm for spatial range query or spatial join. Again, we > > already > > > >>> have > > > >>>>>> algorithms built for different operations implemented in Spatial > > > >> Hadoop > > > >>>>> as > > > >>>>>> MapReduce programs, but they will need to be modified to work in > > Pig > > > >>>>>> environment and get to work with other operations. > > > >>>>>> > > > >>>>>> This is my whole plan for the spatial extension to Pig. I've > > already > > > >>>>>> started with the first step but as I mentioned earlier, I don't > > want > > > >> to > > > >>>>> do > > > >>>>>> the work for our project and then the work gets forgotten. I > want > > to > > > >>>>>> contribute to Pig and do my research at the same time. If you > > think > > > >> the > > > >>>>>> plan is plausible, I'll open JIRA issues for the above tasks and > > > >> start > > > >>>>>> shipping patches to do the stuff. I'll conform with the > standards > > of > > > >>> the > > > >>>>>> project such as adding tests and well commenting the code. > > > >>>>>> Sorry for the long email and hope to hear back from you. > > > >>>>>> > > > >>>>>> > > > >>>>>> Best regards, > > > >>>>>> Ahmed Eldawy > > > >>>>> > > > >>>>> > > > >>> > > > >>> > > > >> > > > > > >