Yes a blog and better yet a wiki post on the OODT wiki would be much appreciated! :-)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Adjunct Associate Professor, Computer Science Department University of Southern California Los Angeles, CA 90089 USA Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Gautham Gowrishankar <[email protected]> Date: Sunday, November 2, 2014 at 9:51 PM To: Chris Mattmann <[email protected]> Subject: Re: Regarding Assignment 2 >Professor, > >I would look into that right now. > > >I would probably write a blog on this and send it to you . >There is so much of information and yet is very hard to find it in at a >single point w.r.t OODTthat is what makes it so hard :) > > >Regards >Gautham > > > >On Sun, Nov 2, 2014 at 8:42 PM, Christian Alan Mattmann ><[email protected]> wrote: > >Check out src/main/resources/examples in the metadata folder.. > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >Chris Mattmann, Ph.D. >Adjunct Associate Professor, Computer Science Department >University of Southern California >Los Angeles, CA 90089 USA >Email: [email protected] >WWW: http://sunset.usc.edu/~mattmann/ >++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > >-----Original Message----- >From: Gautham Gowrishankar <[email protected]> >Date: Sunday, November 2, 2014 at 4:39 PM >To: Chris Mattmann <[email protected]> >Subject: Re: Regarding Assignment 2 > >>Hello Professor, >> >> >>I was trying to configure my FileTokenMetExtractor, should it be >>configured as a external metadat extractor,which i dont think so. >> >> >> >> >><?xml version="1.0" encoding="UTF-8"?> >><cas:externextractor xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas"> >> <exec workingDir=""> >> <extractorBinPath >>envReplace="true">[PWD]/extractor</extractorBinPath> >> <args> >> <arg isDataFile="true"/> >> <arg isPath="true">/usr/local/etc/testExtractor.config</arg> >> </args> >> </exec> >></cas:externextractor> >>Could you provide a link where it is shown as a example how to configure >>new Extractors and what argument names should the file be sent. >>RegardsGautham >>w >> >> >>On Sun, Nov 2, 2014 at 9:41 AM, Christian Alan Mattmann >><[email protected]> wrote: >> >>Hi Gautham, >> >>Thanks and sorry that it’s difficult. Yes, it’s one of the >>harder ones. >> >>As for the metadata, don’t worry about getting it perfect, >>just get going and then you can easily iterate (that’s the >>point of using OODT). >> >>Spanish doesn’t really matter (it’s per job type and the >>spanish fields are equivalent to the English ones). Also >>there is a program in ETLlib that may help you (translatejson). >> >>I told you how to do the InPlaceDataTransfer - use the >>data transferer and check the docs in file manager. >> >>Cheers, >>Chris >> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>Chris Mattmann, Ph.D. >>Adjunct Associate Professor, Computer Science Department >>University of Southern California >>Los Angeles, CA 90089 USA >>Email: [email protected] >>WWW: http://sunset.usc.edu/~mattmann/ >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> >>-----Original Message----- >>From: Gautham Gowrishankar <[email protected]> >>Date: Sunday, November 2, 2014 at 10:26 AM >>To: Chris Mattmann <[email protected]> >>Subject: Re: Regarding Assignment 2 >> >>>Hello Professor, >>> >>> >>>This is actually a hard assignment trying to figure out what actually to >>>do :( . >>> >>> >>>I am actually really trying to think what else can we add to the >>>metadata >>>already present (language is one thing i can think of at the moment) >>> >>> >>> >>> >>>Another question is since the Data is in Spanish wont it be inconvenient >>>to query on such terms and provide unwanted results without performing >>>the actual translation. >>> >>> >>>Also will disabling the path for Data Archive in File Manger properties >>>be enough to do the in place data ingestion. >>> >>> >>>Regards >>>Gautham >>> >>> >>>On Sun, Nov 2, 2014 at 9:16 AM, Christian Alan Mattmann >>><[email protected]> wrote: >>> >>>Hi Gautham, >>> >>>Answers below: >>> >>> >>> >>>-----Original Message----- >>>From: Gautham Gowrishankar <[email protected]> >>>Date: Sunday, November 2, 2014 at 10:13 AM >>>To: Chris Mattmann <[email protected]> >>>Subject: Re: Regarding Assignment 2 >>> >>>>Hello Professor, >>>> >>>> >>>>As recommended i have gone through a number of Links apart from the one >>>>you suggested and here is the conclusion ihave drawn before i actually >>>>start implementing it today :P >>>> >>>> >>>>1 File Manager would extract Metadata----(id would be one using >>>>FileNameTokenMetaData Extractor) .Kindly suggest if i need to use >>>>anything else that would be necessary like Copy and Rewrite Extractor. >>> >>>Yep, and other metadata too. >>> >>>> >>>> >>>>2. I guess like you suggested in class it would be nice just do the >>>>above task in place without injesting the actual files Can this be done >>>>by disabling the path for the Data Archive in Filemanger properties ? >>> >>>Use the InPlaceDataTransferer >>> >>>> >>>> >>>>3. Shell script to to do the above task(extract metadata from >>>>FileManger >>>>by iterating over all the files). >>> >>>Yep. >>> >>>> >>>> >>>>4.Write a CasPge Task to combine the Metadata Extracted with JSON Files >>>>and user poster.py to post it into solar. >>> >>>s/solar/Solr/ >>> >>>Yep. >>> >>>> >>>> >>>>5.Start the workflow manger with the above events configured. >>> >>>Yep. >>> >>>> >>>> >>>>6.Pre Configure Solr Schema to recognize the above fields along with Id >>>>field >>> >>>Yep. >>> >>>> >>>> >>>>7.Write functional queries to test the above. >>> >>>Yep. >>> >>>>=============================================================== >>>>Kindly suggest if we are missing out on the current tasks planned >>>>answer >>>>the below questions >>>> >>>> >>>>Any other Metadata Extractor that needs to be used. >>> >>>Can’t say - up to you on this. >>> >>>>Hints on Link Analysis example and where it can be done i OODT. >>> >>>Link Analysis should be a piece of custom code that you implement (after >>>indexing say in FM, or >>>during) in which you use the built up information to construct a >>>“linkRank” score before indexing >>>in Solr (via CAS-PGE and ETLLib/poster). >>> >>>>Is Function Queries like Recip() and DateBoosting a good enough trick >>>>to >>>>do the queries. >>> >>>These are the types of things to look at for the Content based ranking. >>> >>>Cheers, >>>Chris >>> >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>Chris Mattmann, Ph.D. >>>Adjunct Associate Professor, Computer Science Department >>>University of Southern California >>>Los Angeles, CA 90089 USA >>>Email: [email protected] >>>WWW: http://sunset.usc.edu/~mattmann/ >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> >>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>On Sat, Nov 1, 2014 at 2:41 PM, Christian Alan Mattmann >>>><[email protected]> wrote: >>>> >>>>Already done. >>>> >>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>Chris Mattmann, Ph.D. >>>>Adjunct Associate Professor, Computer Science Department >>>>University of Southern California >>>>Los Angeles, CA 90089 USA >>>>Email: [email protected] >>>>WWW: http://sunset.usc.edu/~mattmann/ >>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> >>>> >>>> >>>> >>>>-----Original Message----- >>>>From: Gautham Gowrishankar <[email protected]> >>>>Date: Saturday, November 1, 2014 at 11:02 AM >>>>To: Chris Mattmann <[email protected]> >>>>Subject: Re: Regarding Assignment 2 >>>> >>>>>Hello Professor, >>>>> >>>>> >>>>>Kindly reply to my earlier mail . >>>>> >>>>> >>>>> >>>>> >>>>>Regards >>>>>Gautham >>>>> >>>>> >>>>>On Fri, Oct 31, 2014 at 5:05 PM, Gautham Gowrishankar >>>>><[email protected]> wrote: >>>>> >>>>>Hello Professor, >>>>> >>>>> >>>>>Looking at the queries you have asked we have derived that >>>>> >>>>> >>>>>Only certain fields of the JSON dataset would be required to be >>>>>extracted >>>>>like >>>>>Posted Date >>>>>Title >>>>>Start >>>>>Duration >>>>>Job Type >>>>>Company >>>>> >>>>>Fist Seen Date >>>>>Location >>>>>Last Seen >>>>> >>>>> >>>>> >>>>> >>>>>Query 1 >>>>>Predict which geospatial areas will have which job types in the >>>>>future. >>>>>==================================== >>>>>Arrange by Descdening Dates with Count for Each Job Type and provide >>>>>proper weights tor rank them. >>>>> >>>>> >>>>>Query 2 >>>>>Compare jobs in terms of quickly they’re filled specifically in >>>>>regards >>>>>to region >>>>>===================================== >>>>>For given each region provide stat of comparison b/w the diff=(first >>>>>seen >>>>>date - last seen date) for each Job >>>>> >>>>> >>>>>Query 3 >>>>>Can you classify and zone cities based on the jobs data (E.G. >>>>>commercial >>>>>shopping region, industrial, residential, business offices, medical, >>>>>etc)? >>>>>===================================== >>>>> >>>>> >>>>>Query 4 >>>>>What are the trends as it relates to full time vs part time employment >>>>>in >>>>>South America? >>>>>====================================== >>>>>For each Time Interval -----compare Part vs Full Time (Job Type) stat >>>>>according to the Location >>>>> >>>>> >>>>> >>>>>Kindly answer the below question on the above conclusion drawn >>>>>1.Do we need extract only the above fields stated as metadata from the >>>>>JSON >>>> >>>> >>> >>> >> >> >>>>>dataset.in <http://dataset.in> <http://dataset.in> >>>>><http://dataset.in> <http://dataset.in> >>>>><http://dataset.in> >>>>>case we need to >>>>>extract only certain >>>>>fields should this be done through script or Java pg and where can we >>>>>find necessary material. >>>>> >>>>> >>>>>2.Kindly point us to some material where we can find a way to injest >>>>>our >>>>>algorithms(Ranking) into Solr >>>>> >>>>> >>>>>3.Give us hints as to where we need to look for Querying through Solr. >>>>> >>>>> >>>>>Regards >>>>>Gautham >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>On Mon, Oct 27, 2014 at 6:31 PM, Gautham Gowrishankar >>>>><[email protected]> wrote: >>>>> >>>>>Hi Prof, >>>>>Below are ResourceManger stub Logs and attached is the status seen on >>>>>GUI >>>>>============================================================ >>>>>java.lang.Exception: batchstub.executeJob returned false >>>>> at >>>>>org.apache.oodt.cas.resource.batchmgr.XmlRpcBatchMgrProxy.run(XmlRpcBa >>>>>t >>>>>c >>>>>h >>>>>M >>>>>grProxy.java:125) >>>>> >>>>> >>>>>and below is the WorkFlow Manger Logs >>>>>---------------------------------------------------------------------- >>>>>- >>>>>- >>>>>- >>>>>- >>>>>------------------------------- >>>>> >>>>>FINEST: [{job.queueName=high, >>>>>job.instanceClassName=org.apache.oodt.cas.workflow.structs.TaskJob, >>>>>job.name <http://job.name> <http://job.name> <http://job.name> >>>>><http://job.name> >>>>><http://job.name>=urn:oodt:FileConcatenator, >> >> >>>>>job.id <http://job.id> <http://job.id> <http://job.id> <http://job.id> >>>>><http://job.id>=, job.status=, >>>>>job.load=2, >>>>>job.inputClassName=org.apache.oodt.cas.workflow.structs.TaskJobInput}, >>>>>{task.instance.class=org.apache.oodt.pge.examples.fileconcatenator.Fil >>>>>e >>>>>C >>>>>o >>>>>n >>>>>catenatorPGETask, >>>>>task.config={PGETask_ConfigFilePath=null/file_concatenator/pge-configs >>>>>/ >>>>>P >>>>>G >>>>>E >>>>>Config.xml, >>>>> >>>>>PCS_ClientTransferServiceFactory=org.apache.oodt.cas.filemgr.datatrans >>>>>f >>>>>e >>>>>r >>>>>. >>>>>LocalDataTransferFactory, >>>>>PCS_ActionRepoFile=file:/Users/Adarsh/oodt-deploy/crawler/policy/crawl >>>>>e >>>>>r >>>>>- >>>>>c >>>>>onfig.xml, PCS_MetFileExtension=met, PGETask_DumpMetadata=true, >>>>>PCS_WorkflowManagerUrl=http://localhost:9200, >>>>> PCS_FileManagerUrl=http://localhost:9000, >>>>>PGETask_Name=FileConcatenator}, >>>>>task.metadata={TaskId=[urn:oodt:FileConcatenator], >>>>>WorkflowManagerUrl=[http://Adarshs-MacBook-Pro.local:9200], >>>>> JobId=[a551fd81-5e3c-11e4-b229-73fd473a7137], RunID=[testNumber2], >>>>>ProcessingNode=[Adarshs-MacBook-Pro.local], >>>>>WorkflowInstId=[a551fd81-5e3c-11e4-b229-73fd473a7137]}}] >>>>>Kindly let us know since the exception is a single line and we are not >>>>>able to figure out the source. >>>>>Regards >>>>>Gautham >>>>> >>>>> >>>>>On Mon, Oct 27, 2014 at 4:11 PM, Gautham Gowrishankar >>>>><[email protected]> wrote: >>>>> >>>>>Hi Prof, >>>>> >>>>> >>>>>We were trying out the CasPGE learn by example tutorial,but after >>>>>starting out the workflow it has taken 15 mins and only 33% has been >>>>>completed. >>>>> >>>>> >>>>>Attached is the screenshot,Kindly let us know whether we are on the >>>>>right >>>>>track. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>Regards >>>>>Gautham >>>>> >>>>> >>>>>On Mon, Oct 27, 2014 at 11:51 AM, Gautham Gowrishankar >>>>><[email protected]> wrote: >>>>> >>>>>Hy Professor, >>>>> >>>>> >>>>> >>>>> >>>>>I have issues running the ./querytool the following lines are what it >>>>>seems to be pointing to >>>>> >>>>> >>>>> >>>>> >>>>>================== >>>>>"$_RUNJAVA" $JAVA_OPTS $OODT_OPTS \ >>>>> -Djava.endorsed.dirs=../lib \ >>>>> org.apache.oodt.cas.filemgr.tools.QueryTool "$@" >>>>> >>>>> >>>>>Any idea ? what the issue u think >>>>> >>>>> >>>>>Regards >>>>>Gautham >>>>> >>>>> >>>>> >>>>>On Mon, Oct 20, 2014 at 7:45 PM, Christian Alan Mattmann >>>>><[email protected]> wrote: >>>>> >>>>>Hi Gautham, >>>>> >>>>>Thanks for your question - one of the main reasons is that >>>>>you can keep track of the upstream provenance using OODT >>>>>which may or may not aid you in your ranking computation >>>>>later on. There are some other things (e.g., automated >>>>>benchmarking and so forth) that OODT provides. >>>>> >>>>>ETLlib also has an easy poster too that I¹d like you guys >>>>>to try using that¹s just as easy (if not more) than post.jar. >>>>> >>>>>Cheers, >>>>>Chris >>>>> >>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>Chris Mattmann, Ph.D. >>>>>Adjunct Associate Professor, Computer Science Department >>>>>University of Southern California >>>>>Los Angeles, CA 90089 USA >>>>>Email: [email protected] >>>>>WWW: http://sunset.usc.edu/~mattmann/ >>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> >>>>> >>>>> >>>>> >>>>>-----Original Message----- >>>>>From: Gautham Gowrishankar <[email protected]> >>>>>Date: Monday, October 20, 2014 at 5:07 PM >>>>>To: Chris Mattmann <[email protected]> >>>>>Subject: Regarding Assignment 2 >>>>> >>>>>>Hello Professor, >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>I had a question regarding the index, >>>>>>Why were we asked to index the JSON files using OODT and ETLib when >>>>>>Solr >>>>>>has the capability to perform automatic indexing for eg using the >>>>>>post.jar. >>>>>> >>>>>> >>>>>>Regards >>>>>>Gautham >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> >>> >>> >>> >>> >> >> >> >> >> >> >> >> > > > > > > > >
