Hi Gautham, Please try calling FileTokenNameMetExtractor from the Crawler (MetExtractorProductCrawler).
Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Adjunct Associate Professor, Computer Science Department University of Southern California Los Angeles, CA 90089 USA Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Gautham Gowrishankar <[email protected]> Date: Monday, November 3, 2014 at 6:11 PM To: Chris Mattmann <[email protected]> Subject: Re: Regarding Assignment 2 >Hello Professor, > > >I was able to figure out how to configure the config.xml file for >FileTokenNameMetExtractor,so how should i include the config path. > > >Should this be set in filemanager.properties file or as cas-external file >extracotor as below > > >Inside filemanger.properties as > > ><extractor class=org......FileNameTokenMetExtractor> > > ></extractor> > > > > >or >======================================================== ><?xml version="1.0" encoding="UTF-8"?> ><cas:externextractor xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas"> > <exec workingDir=""> > <extractorBinPath envReplace="true">[PWD]/extractor</extractorBinPath> > <args> > <arg isDataFile="true"/> > <arg isPath="true">/usr/local/etc/testExtractor.config</arg> > </args> > </exec> ></cas:externextractor> > > > > >Also let us know how the path for the config.xml file should be >configured. > > > > >Regards >Gautham > > > > >On Sun, Nov 2, 2014 at 9:27 PM, Christian Alan Mattmann ><[email protected]> wrote: > >Yes a blog and better yet a wiki post on the OODT wiki >would be much appreciated! :-) > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >Chris Mattmann, Ph.D. >Adjunct Associate Professor, Computer Science Department >University of Southern California >Los Angeles, CA 90089 USA >Email: [email protected] >WWW: http://sunset.usc.edu/~mattmann/ >++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > >-----Original Message----- >From: Gautham Gowrishankar <[email protected]> >Date: Sunday, November 2, 2014 at 9:51 PM >To: Chris Mattmann <[email protected]> >Subject: Re: Regarding Assignment 2 > >>Professor, >> >>I would look into that right now. >> >> >>I would probably write a blog on this and send it to you . >>There is so much of information and yet is very hard to find it in at a >>single point w.r.t OODTthat is what makes it so hard :) >> >> >>Regards >>Gautham >> >> >> >>On Sun, Nov 2, 2014 at 8:42 PM, Christian Alan Mattmann >><[email protected]> wrote: >> >>Check out src/main/resources/examples in the metadata folder.. >> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>Chris Mattmann, Ph.D. >>Adjunct Associate Professor, Computer Science Department >>University of Southern California >>Los Angeles, CA 90089 USA >>Email: [email protected] >>WWW: http://sunset.usc.edu/~mattmann/ >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> >>-----Original Message----- >>From: Gautham Gowrishankar <[email protected]> >>Date: Sunday, November 2, 2014 at 4:39 PM >>To: Chris Mattmann <[email protected]> >>Subject: Re: Regarding Assignment 2 >> >>>Hello Professor, >>> >>> >>>I was trying to configure my FileTokenMetExtractor, should it be >>>configured as a external metadat extractor,which i dont think so. >>> >>> >>> >>> >>><?xml version="1.0" encoding="UTF-8"?> >>><cas:externextractor xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas"> >>> <exec workingDir=""> >>> <extractorBinPath >>>envReplace="true">[PWD]/extractor</extractorBinPath> >>> <args> >>> <arg isDataFile="true"/> >>> <arg isPath="true">/usr/local/etc/testExtractor.config</arg> >>> </args> >>> </exec> >>></cas:externextractor> >>>Could you provide a link where it is shown as a example how to configure >>>new Extractors and what argument names should the file be sent. >>>RegardsGautham >>>w >>> >>> >>>On Sun, Nov 2, 2014 at 9:41 AM, Christian Alan Mattmann >>><[email protected]> wrote: >>> >>>Hi Gautham, >>> >>>Thanks and sorry that it’s difficult. Yes, it’s one of the >>>harder ones. >>> >>>As for the metadata, don’t worry about getting it perfect, >>>just get going and then you can easily iterate (that’s the >>>point of using OODT). >>> >>>Spanish doesn’t really matter (it’s per job type and the >>>spanish fields are equivalent to the English ones). Also >>>there is a program in ETLlib that may help you (translatejson). >>> >>>I told you how to do the InPlaceDataTransfer - use the >>>data transferer and check the docs in file manager. >>> >>>Cheers, >>>Chris >>> >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>Chris Mattmann, Ph.D. >>>Adjunct Associate Professor, Computer Science Department >>>University of Southern California >>>Los Angeles, CA 90089 USA >>>Email: [email protected] >>>WWW: http://sunset.usc.edu/~mattmann/ >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> >>> >>> >>>-----Original Message----- >>>From: Gautham Gowrishankar <[email protected]> >>>Date: Sunday, November 2, 2014 at 10:26 AM >>>To: Chris Mattmann <[email protected]> >>>Subject: Re: Regarding Assignment 2 >>> >>>>Hello Professor, >>>> >>>> >>>>This is actually a hard assignment trying to figure out what actually >>>>to >>>>do :( . >>>> >>>> >>>>I am actually really trying to think what else can we add to the >>>>metadata >>>>already present (language is one thing i can think of at the moment) >>>> >>>> >>>> >>>> >>>>Another question is since the Data is in Spanish wont it be >>>>inconvenient >>>>to query on such terms and provide unwanted results without performing >>>>the actual translation. >>>> >>>> >>>>Also will disabling the path for Data Archive in File Manger properties >>>>be enough to do the in place data ingestion. >>>> >>>> >>>>Regards >>>>Gautham >>>> >>>> >>>>On Sun, Nov 2, 2014 at 9:16 AM, Christian Alan Mattmann >>>><[email protected]> wrote: >>>> >>>>Hi Gautham, >>>> >>>>Answers below: >>>> >>>> >>>> >>>>-----Original Message----- >>>>From: Gautham Gowrishankar <[email protected]> >>>>Date: Sunday, November 2, 2014 at 10:13 AM >>>>To: Chris Mattmann <[email protected]> >>>>Subject: Re: Regarding Assignment 2 >>>> >>>>>Hello Professor, >>>>> >>>>> >>>>>As recommended i have gone through a number of Links apart from the >>>>>one >>>>>you suggested and here is the conclusion ihave drawn before i actually >>>>>start implementing it today :P >>>>> >>>>> >>>>>1 File Manager would extract Metadata----(id would be one using >>>>>FileNameTokenMetaData Extractor) .Kindly suggest if i need to use >>>>>anything else that would be necessary like Copy and Rewrite Extractor. >>>> >>>>Yep, and other metadata too. >>>> >>>>> >>>>> >>>>>2. I guess like you suggested in class it would be nice just do the >>>>>above task in place without injesting the actual files Can this be >>>>>done >>>>>by disabling the path for the Data Archive in Filemanger properties ? >>>> >>>>Use the InPlaceDataTransferer >>>> >>>>> >>>>> >>>>>3. Shell script to to do the above task(extract metadata from >>>>>FileManger >>>>>by iterating over all the files). >>>> >>>>Yep. >>>> >>>>> >>>>> >>>>>4.Write a CasPge Task to combine the Metadata Extracted with JSON >>>>>Files >>>>>and user poster.py to post it into solar. >>>> >>>>s/solar/Solr/ >>>> >>>>Yep. >>>> >>>>> >>>>> >>>>>5.Start the workflow manger with the above events configured. >>>> >>>>Yep. >>>> >>>>> >>>>> >>>>>6.Pre Configure Solr Schema to recognize the above fields along with >>>>>Id >>>>>field >>>> >>>>Yep. >>>> >>>>> >>>>> >>>>>7.Write functional queries to test the above. >>>> >>>>Yep. >>>> >>>>>=============================================================== >>>>>Kindly suggest if we are missing out on the current tasks planned >>>>>answer >>>>>the below questions >>>>> >>>>> >>>>>Any other Metadata Extractor that needs to be used. >>>> >>>>Can’t say - up to you on this. >>>> >>>>>Hints on Link Analysis example and where it can be done i OODT. >>>> >>>>Link Analysis should be a piece of custom code that you implement >>>>(after >>>>indexing say in FM, or >>>>during) in which you use the built up information to construct a >>>>“linkRank” score before indexing >>>>in Solr (via CAS-PGE and ETLLib/poster). >>>> >>>>>Is Function Queries like Recip() and DateBoosting a good enough trick >>>>>to >>>>>do the queries. >>>> >>>>These are the types of things to look at for the Content based ranking. >>>> >>>>Cheers, >>>>Chris >>>> >>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>Chris Mattmann, Ph.D. >>>>Adjunct Associate Professor, Computer Science Department >>>>University of Southern California >>>>Los Angeles, CA 90089 USA >>>>Email: [email protected] >>>>WWW: http://sunset.usc.edu/~mattmann/ >>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> >>>> >>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>On Sat, Nov 1, 2014 at 2:41 PM, Christian Alan Mattmann >>>>><[email protected]> wrote: >>>>> >>>>>Already done. >>>>> >>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>Chris Mattmann, Ph.D. >>>>>Adjunct Associate Professor, Computer Science Department >>>>>University of Southern California >>>>>Los Angeles, CA 90089 USA >>>>>Email: [email protected] >>>>>WWW: http://sunset.usc.edu/~mattmann/ >>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> >>>>> >>>>> >>>>> >>>>>-----Original Message----- >>>>>From: Gautham Gowrishankar <[email protected]> >>>>>Date: Saturday, November 1, 2014 at 11:02 AM >>>>>To: Chris Mattmann <[email protected]> >>>>>Subject: Re: Regarding Assignment 2 >>>>> >>>>>>Hello Professor, >>>>>> >>>>>> >>>>>>Kindly reply to my earlier mail . >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>Regards >>>>>>Gautham >>>>>> >>>>>> >>>>>>On Fri, Oct 31, 2014 at 5:05 PM, Gautham Gowrishankar >>>>>><[email protected]> wrote: >>>>>> >>>>>>Hello Professor, >>>>>> >>>>>> >>>>>>Looking at the queries you have asked we have derived that >>>>>> >>>>>> >>>>>>Only certain fields of the JSON dataset would be required to be >>>>>>extracted >>>>>>like >>>>>>Posted Date >>>>>>Title >>>>>>Start >>>>>>Duration >>>>>>Job Type >>>>>>Company >>>>>> >>>>>>Fist Seen Date >>>>>>Location >>>>>>Last Seen >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>Query 1 >>>>>>Predict which geospatial areas will have which job types in the >>>>>>future. >>>>>>==================================== >>>>>>Arrange by Descdening Dates with Count for Each Job Type and provide >>>>>>proper weights tor rank them. >>>>>> >>>>>> >>>>>>Query 2 >>>>>>Compare jobs in terms of quickly they’re filled specifically in >>>>>>regards >>>>>>to region >>>>>>===================================== >>>>>>For given each region provide stat of comparison b/w the diff=(first >>>>>>seen >>>>>>date - last seen date) for each Job >>>>>> >>>>>> >>>>>>Query 3 >>>>>>Can you classify and zone cities based on the jobs data (E.G. >>>>>>commercial >>>>>>shopping region, industrial, residential, business offices, medical, >>>>>>etc)? >>>>>>===================================== >>>>>> >>>>>> >>>>>>Query 4 >>>>>>What are the trends as it relates to full time vs part time >>>>>>employment >>>>>>in >>>>>>South America? >>>>>>====================================== >>>>>>For each Time Interval -----compare Part vs Full Time (Job Type) stat >>>>>>according to the Location >>>>>> >>>>>> >>>>>> >>>>>>Kindly answer the below question on the above conclusion drawn >>>>>>1.Do we need extract only the above fields stated as metadata from >>>>>>the >>>>>>JSON >>>>> >>>>> >>>> >>>> >>> >>> >>>>>>dataset.in <http://dataset.in> <http://dataset.in> >>>>>><http://dataset.in> >>>>>><http://dataset.in> <http://dataset.in> >>>>>><http://dataset.in> >>>>>>case we need to >>>>>>extract only certain >>>>>>fields should this be done through script or Java pg and where can we >>>>>>find necessary material. >>>>>> >>>>>> >>>>>>2.Kindly point us to some material where we can find a way to injest >>>>>>our >>>>>>algorithms(Ranking) into Solr >>>>>> >>>>>> >>>>>>3.Give us hints as to where we need to look for Querying through >>>>>>Solr. >>>>>> >>>>>> >>>>>>Regards >>>>>>Gautham >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>On Mon, Oct 27, 2014 at 6:31 PM, Gautham Gowrishankar >>>>>><[email protected]> wrote: >>>>>> >>>>>>Hi Prof, >>>>>>Below are ResourceManger stub Logs and attached is the status seen on >>>>>>GUI >>>>>>============================================================ >>>>>>java.lang.Exception: batchstub.executeJob returned false >>>>>> at >>>>>>org.apache.oodt.cas.resource.batchmgr.XmlRpcBatchMgrProxy.run(XmlRpcB >>>>>>a >>>>>>t >>>>>>c >>>>>>h >>>>>>M >>>>>>grProxy.java:125) >>>>>> >>>>>> >>>>>>and below is the WorkFlow Manger Logs >>>>>>--------------------------------------------------------------------- >>>>>>- >>>>>>- >>>>>>- >>>>>>- >>>>>>- >>>>>>------------------------------- >>>>>> >>>>>>FINEST: [{job.queueName=high, >>>>>>job.instanceClassName=org.apache.oodt.cas.workflow.structs.TaskJob, >>>>>>job.name <http://job.name> <http://job.name> <http://job.name> >>>>>><http://job.name> >>>>>><http://job.name> >>>>>><http://job.name>=urn:oodt:FileConcatenator, >>> >>> >>>>>>job.id <http://job.id> <http://job.id> <http://job.id> >>>>>><http://job.id> <http://job.id> >>>>>><http://job.id>=, job.status=, >>>>>>job.load=2, >>>>>>job.inputClassName=org.apache.oodt.cas.workflow.structs.TaskJobInput} >>>>>>, >>>>>>{task.instance.class=org.apache.oodt.pge.examples.fileconcatenator.Fi >>>>>>l >>>>>>e >>>>>>C >>>>>>o >>>>>>n >>>>>>catenatorPGETask, >>>>>>task.config={PGETask_ConfigFilePath=null/file_concatenator/pge-config >>>>>>s >>>>>>/ >>>>>>P >>>>>>G >>>>>>E >>>>>>Config.xml, >>>>>> >>>>>>PCS_ClientTransferServiceFactory=org.apache.oodt.cas.filemgr.datatran >>>>>>s >>>>>>f >>>>>>e >>>>>>r >>>>>>. >>>>>>LocalDataTransferFactory, >>>>>>PCS_ActionRepoFile=file:/Users/Adarsh/oodt-deploy/crawler/policy/craw >>>>>>l >>>>>>e >>>>>>r >>>>>>- >>>>>>c >>>>>>onfig.xml, PCS_MetFileExtension=met, PGETask_DumpMetadata=true, >>>>>>PCS_WorkflowManagerUrl=http://localhost:9200, >>>>>> PCS_FileManagerUrl=http://localhost:9000, >>>>>>PGETask_Name=FileConcatenator}, >>>>>>task.metadata={TaskId=[urn:oodt:FileConcatenator], >>>>>>WorkflowManagerUrl=[http://Adarshs-MacBook-Pro.local:9200], >>>>>> JobId=[a551fd81-5e3c-11e4-b229-73fd473a7137], RunID=[testNumber2], >>>>>>ProcessingNode=[Adarshs-MacBook-Pro.local], >>>>>>WorkflowInstId=[a551fd81-5e3c-11e4-b229-73fd473a7137]}}] >>>>>>Kindly let us know since the exception is a single line and we are >>>>>>not >>>>>>able to figure out the source. >>>>>>Regards >>>>>>Gautham >>>>>> >>>>>> >>>>>>On Mon, Oct 27, 2014 at 4:11 PM, Gautham Gowrishankar >>>>>><[email protected]> wrote: >>>>>> >>>>>>Hi Prof, >>>>>> >>>>>> >>>>>>We were trying out the CasPGE learn by example tutorial,but after >>>>>>starting out the workflow it has taken 15 mins and only 33% has been >>>>>>completed. >>>>>> >>>>>> >>>>>>Attached is the screenshot,Kindly let us know whether we are on the >>>>>>right >>>>>>track. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>Regards >>>>>>Gautham >>>>>> >>>>>> >>>>>>On Mon, Oct 27, 2014 at 11:51 AM, Gautham Gowrishankar >>>>>><[email protected]> wrote: >>>>>> >>>>>>Hy Professor, >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>I have issues running the ./querytool the following lines are what >>>>>>it >>>>>>seems to be pointing to >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>================== >>>>>>"$_RUNJAVA" $JAVA_OPTS $OODT_OPTS \ >>>>>> -Djava.endorsed.dirs=../lib \ >>>>>> org.apache.oodt.cas.filemgr.tools.QueryTool "$@" >>>>>> >>>>>> >>>>>>Any idea ? what the issue u think >>>>>> >>>>>> >>>>>>Regards >>>>>>Gautham >>>>>> >>>>>> >>>>>> >>>>>>On Mon, Oct 20, 2014 at 7:45 PM, Christian Alan Mattmann >>>>>><[email protected]> wrote: >>>>>> >>>>>>Hi Gautham, >>>>>> >>>>>>Thanks for your question - one of the main reasons is that >>>>>>you can keep track of the upstream provenance using OODT >>>>>>which may or may not aid you in your ranking computation >>>>>>later on. There are some other things (e.g., automated >>>>>>benchmarking and so forth) that OODT provides. >>>>>> >>>>>>ETLlib also has an easy poster too that I¹d like you guys >>>>>>to try using that¹s just as easy (if not more) than post.jar. >>>>>> >>>>>>Cheers, >>>>>>Chris >>>>>> >>>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>>Chris Mattmann, Ph.D. >>>>>>Adjunct Associate Professor, Computer Science Department >>>>>>University of Southern California >>>>>>Los Angeles, CA 90089 USA >>>>>>Email: [email protected] >>>>>>WWW: http://sunset.usc.edu/~mattmann/ >>>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>-----Original Message----- >>>>>>From: Gautham Gowrishankar <[email protected]> >>>>>>Date: Monday, October 20, 2014 at 5:07 PM >>>>>>To: Chris Mattmann <[email protected]> >>>>>>Subject: Regarding Assignment 2 >>>>>> >>>>>>>Hello Professor, >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>I had a question regarding the index, >>>>>>>Why were we asked to index the JSON files using OODT and ETLib when >>>>>>>Solr >>>>>>>has the capability to perform automatic indexing for eg using the >>>>>>>post.jar. >>>>>>> >>>>>>> >>>>>>>Regards >>>>>>>Gautham >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> >>> >>> >>> >>> >> >> >> >> >> >> >> >> > > > > > > > >
