Re: Regarding Assignment 2

Christian Alan Mattmann Mon, 03 Nov 2014 19:05:11 -0800

Yes a blog and better yet a wiki post on the OODT wiki
would be much appreciated! :-)


++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Adjunct Associate Professor, Computer Science Department
University of Southern California
Los Angeles, CA 90089 USA
Email: [email protected]
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++




-----Original Message-----
From: Gautham Gowrishankar <[email protected]>
Date: Sunday, November 2, 2014 at 9:51 PM
To: Chris Mattmann <[email protected]>
Subject: Re: Regarding Assignment 2

>Professor,
>
>I would look into that right now.
>
>
>I would probably write a blog on this and send it to you .
>There is so much of information and yet is very hard to find it in at a
>single point w.r.t OODTthat is what makes it so hard :)
>
>
>Regards
>Gautham
>
>
>
>On Sun, Nov 2, 2014 at 8:42 PM, Christian Alan Mattmann
><[email protected]> wrote:
>
>Check out src/main/resources/examples in the metadata folder..
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Adjunct Associate Professor, Computer Science Department
>University of Southern California
>Los Angeles, CA 90089 USA
>Email: [email protected]
>WWW: http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>-----Original Message-----
>From: Gautham Gowrishankar <[email protected]>
>Date: Sunday, November 2, 2014 at 4:39 PM
>To: Chris Mattmann <[email protected]>
>Subject: Re: Regarding Assignment 2
>
>>Hello Professor,
>>
>>
>>I was trying to configure my FileTokenMetExtractor, should it be
>>configured as a external metadat extractor,which i dont think so.
>>
>>
>>
>>
>><?xml version="1.0" encoding="UTF-8"?>
>><cas:externextractor xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas";>
>>  <exec workingDir="">
>>    <extractorBinPath
>>envReplace="true">[PWD]/extractor</extractorBinPath>
>>      <args>
>>         <arg isDataFile="true"/>
>>         <arg isPath="true">/usr/local/etc/testExtractor.config</arg>
>>      </args>
>>   </exec>
>></cas:externextractor>
>>Could you provide a link where it is shown as a example how to configure
>>new Extractors and what argument names should the file be sent.
>>RegardsGautham
>>w
>>
>>
>>On Sun, Nov 2, 2014 at 9:41 AM, Christian Alan Mattmann
>><[email protected]> wrote:
>>
>>Hi Gautham,
>>
>>Thanks and sorry that it’s difficult. Yes, it’s one of the
>>harder ones.
>>
>>As for the metadata, don’t worry about getting it perfect,
>>just get going and then you can easily iterate (that’s the
>>point of using OODT).
>>
>>Spanish doesn’t really matter (it’s per job type and the
>>spanish fields are equivalent to the English ones). Also
>>there is a program in ETLlib that may help you (translatejson).
>>
>>I told you how to do the InPlaceDataTransfer - use the
>>data transferer and check the docs in file manager.
>>
>>Cheers,
>>Chris
>>
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>Chris Mattmann, Ph.D.
>>Adjunct Associate Professor, Computer Science Department
>>University of Southern California
>>Los Angeles, CA 90089 USA
>>Email: [email protected]
>>WWW: http://sunset.usc.edu/~mattmann/
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>-----Original Message-----
>>From: Gautham Gowrishankar <[email protected]>
>>Date: Sunday, November 2, 2014 at 10:26 AM
>>To: Chris Mattmann <[email protected]>
>>Subject: Re: Regarding Assignment 2
>>
>>>Hello Professor,
>>>
>>>
>>>This is actually a hard assignment trying to figure out what actually to
>>>do  :( .
>>>
>>>
>>>I am actually really trying to think what else can we add to the
>>>metadata
>>>already present (language is one thing i can think of at the moment)
>>>
>>>
>>>
>>>
>>>Another question is since the Data is in Spanish wont it be inconvenient
>>>to query on such terms and provide unwanted results without performing
>>>the actual translation.
>>>
>>>
>>>Also will disabling the path for Data Archive in File Manger properties
>>>be enough to do the in place data ingestion.
>>>
>>>
>>>Regards
>>>Gautham
>>>
>>>
>>>On Sun, Nov 2, 2014 at 9:16 AM, Christian Alan Mattmann
>>><[email protected]> wrote:
>>>
>>>Hi Gautham,
>>>
>>>Answers below:
>>>
>>>
>>>
>>>-----Original Message-----
>>>From: Gautham Gowrishankar <[email protected]>
>>>Date: Sunday, November 2, 2014 at 10:13 AM
>>>To: Chris Mattmann <[email protected]>
>>>Subject: Re: Regarding Assignment 2
>>>
>>>>Hello Professor,
>>>>
>>>>
>>>>As recommended i have gone through a number of Links apart from the one
>>>>you suggested and here is the conclusion ihave drawn before i actually
>>>>start implementing it today :P
>>>>
>>>>
>>>>1 File Manager would extract Metadata----(id would be one  using
>>>>FileNameTokenMetaData Extractor) .Kindly suggest if i need to use
>>>>anything else that would be necessary like Copy and Rewrite Extractor.
>>>
>>>Yep, and other metadata too.
>>>
>>>>
>>>>
>>>>2. I guess like you suggested in class  it would be nice just do the
>>>>above task in place without injesting the actual files Can this be done
>>>>by disabling the path for the Data Archive in Filemanger properties ?
>>>
>>>Use the InPlaceDataTransferer
>>>
>>>>
>>>>
>>>>3. Shell script to  to do the above task(extract metadata from
>>>>FileManger
>>>>by iterating over all the files).
>>>
>>>Yep.
>>>
>>>>
>>>>
>>>>4.Write a CasPge Task to combine the Metadata Extracted with JSON Files
>>>>and user poster.py to post it into solar.
>>>
>>>s/solar/Solr/
>>>
>>>Yep.
>>>
>>>>
>>>>
>>>>5.Start the workflow manger with the above events configured.
>>>
>>>Yep.
>>>
>>>>
>>>>
>>>>6.Pre Configure Solr Schema to recognize the above fields along with Id
>>>>field
>>>
>>>Yep.
>>>
>>>>
>>>>
>>>>7.Write functional queries to test the above.
>>>
>>>Yep.
>>>
>>>>===============================================================
>>>>Kindly suggest if we are missing out on the current tasks planned
>>>>answer
>>>>the below questions
>>>>
>>>>
>>>>Any other Metadata Extractor that needs to be used.
>>>
>>>Can’t say - up to you on this.
>>>
>>>>Hints on Link Analysis example and where it can be done i OODT.
>>>
>>>Link Analysis should be a piece of custom code that you implement (after
>>>indexing say in FM, or
>>>during) in which you use the built up information to construct a
>>>“linkRank” score before indexing
>>>in Solr (via CAS-PGE and ETLLib/poster).
>>>
>>>>Is Function Queries like Recip() and DateBoosting a good enough trick
>>>>to
>>>>do the queries.
>>>
>>>These are the types of things to look at for the Content based ranking.
>>>
>>>Cheers,
>>>Chris
>>>
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>Chris Mattmann, Ph.D.
>>>Adjunct Associate Professor, Computer Science Department
>>>University of Southern California
>>>Los Angeles, CA 90089 USA
>>>Email: [email protected]
>>>WWW: http://sunset.usc.edu/~mattmann/
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>On Sat, Nov 1, 2014 at 2:41 PM, Christian Alan Mattmann
>>>><[email protected]> wrote:
>>>>
>>>>Already done.
>>>>
>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>Chris Mattmann, Ph.D.
>>>>Adjunct Associate Professor, Computer Science Department
>>>>University of Southern California
>>>>Los Angeles, CA 90089 USA
>>>>Email: [email protected]
>>>>WWW: http://sunset.usc.edu/~mattmann/
>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>>
>>>>
>>>>
>>>>-----Original Message-----
>>>>From: Gautham Gowrishankar <[email protected]>
>>>>Date: Saturday, November 1, 2014 at 11:02 AM
>>>>To: Chris Mattmann <[email protected]>
>>>>Subject: Re: Regarding Assignment 2
>>>>
>>>>>Hello Professor,
>>>>>
>>>>>
>>>>>Kindly reply to my earlier mail .
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>Regards
>>>>>Gautham
>>>>>
>>>>>
>>>>>On Fri, Oct 31, 2014 at 5:05 PM, Gautham Gowrishankar
>>>>><[email protected]> wrote:
>>>>>
>>>>>Hello Professor,
>>>>>
>>>>>
>>>>>Looking at the queries you have asked we have derived  that
>>>>>
>>>>>
>>>>>Only certain fields of the JSON dataset would be required to be
>>>>>extracted
>>>>>like
>>>>>Posted Date
>>>>>Title
>>>>>Start
>>>>>Duration
>>>>>Job Type
>>>>>Company
>>>>>
>>>>>Fist Seen Date
>>>>>Location
>>>>>Last Seen
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>Query 1
>>>>>Predict which geospatial areas will have which job types in the
>>>>>future.
>>>>>====================================
>>>>>Arrange by Descdening Dates with Count for Each Job Type and provide
>>>>>proper weights tor rank them.
>>>>>
>>>>>
>>>>>Query 2
>>>>>Compare jobs in terms of quickly they’re filled specifically in
>>>>>regards
>>>>>to region
>>>>>=====================================
>>>>>For given each region provide stat of comparison b/w the diff=(first
>>>>>seen
>>>>>date - last seen date) for each Job
>>>>>
>>>>>
>>>>>Query 3
>>>>>Can you classify and zone cities based on the jobs data (E.G.
>>>>>commercial
>>>>>shopping region, industrial, residential, business offices, medical,
>>>>>etc)?
>>>>>=====================================
>>>>>
>>>>>
>>>>>Query 4
>>>>>What are the trends as it relates to full time vs part time employment
>>>>>in
>>>>>South America?
>>>>>======================================
>>>>>For each Time Interval -----compare Part vs Full Time (Job Type) stat
>>>>>according to the Location
>>>>>
>>>>>
>>>>>
>>>>>Kindly answer the below question on the above conclusion drawn
>>>>>1.Do we need extract only the above fields stated as metadata from the
>>>>>JSON
>>>>
>>>>
>>>
>>>
>>
>>
>>>>>dataset.in <http://dataset.in> <http://dataset.in>
>>>>><http://dataset.in> <http://dataset.in>
>>>>><http://dataset.in>
>>>>>case we need to
>>>>>extract only certain
>>>>>fields should this be done through script or Java pg and where can we
>>>>>find necessary material.
>>>>>
>>>>>
>>>>>2.Kindly point us to some material where we can find a way to injest
>>>>>our
>>>>>algorithms(Ranking) into Solr
>>>>>
>>>>>
>>>>>3.Give us hints as to where we need to look for Querying through Solr.
>>>>>
>>>>>
>>>>>Regards
>>>>>Gautham
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>On Mon, Oct 27, 2014 at 6:31 PM, Gautham Gowrishankar
>>>>><[email protected]> wrote:
>>>>>
>>>>>Hi Prof,
>>>>>Below are ResourceManger stub Logs and attached is the status seen on
>>>>>GUI
>>>>>============================================================
>>>>>java.lang.Exception: batchstub.executeJob returned false
>>>>>        at
>>>>>org.apache.oodt.cas.resource.batchmgr.XmlRpcBatchMgrProxy.run(XmlRpcBa
>>>>>t
>>>>>c
>>>>>h
>>>>>M
>>>>>grProxy.java:125)
>>>>>
>>>>>
>>>>>and below is the WorkFlow Manger Logs
>>>>>----------------------------------------------------------------------
>>>>>-
>>>>>-
>>>>>-
>>>>>-
>>>>>-------------------------------
>>>>>
>>>>>FINEST: [{job.queueName=high,
>>>>>job.instanceClassName=org.apache.oodt.cas.workflow.structs.TaskJob,
>>>>>job.name <http://job.name> <http://job.name> <http://job.name>
>>>>><http://job.name>
>>>>><http://job.name>=urn:oodt:FileConcatenator,
>>
>>
>>>>>job.id <http://job.id> <http://job.id> <http://job.id> <http://job.id>
>>>>><http://job.id>=, job.status=,
>>>>>job.load=2,
>>>>>job.inputClassName=org.apache.oodt.cas.workflow.structs.TaskJobInput},
>>>>>{task.instance.class=org.apache.oodt.pge.examples.fileconcatenator.Fil
>>>>>e
>>>>>C
>>>>>o
>>>>>n
>>>>>catenatorPGETask,
>>>>>task.config={PGETask_ConfigFilePath=null/file_concatenator/pge-configs
>>>>>/
>>>>>P
>>>>>G
>>>>>E
>>>>>Config.xml,
>>>>>
>>>>>PCS_ClientTransferServiceFactory=org.apache.oodt.cas.filemgr.datatrans
>>>>>f
>>>>>e
>>>>>r
>>>>>.
>>>>>LocalDataTransferFactory,
>>>>>PCS_ActionRepoFile=file:/Users/Adarsh/oodt-deploy/crawler/policy/crawl
>>>>>e
>>>>>r
>>>>>-
>>>>>c
>>>>>onfig.xml, PCS_MetFileExtension=met, PGETask_DumpMetadata=true,
>>>>>PCS_WorkflowManagerUrl=http://localhost:9200,
>>>>> PCS_FileManagerUrl=http://localhost:9000,
>>>>>PGETask_Name=FileConcatenator},
>>>>>task.metadata={TaskId=[urn:oodt:FileConcatenator],
>>>>>WorkflowManagerUrl=[http://Adarshs-MacBook-Pro.local:9200],
>>>>> JobId=[a551fd81-5e3c-11e4-b229-73fd473a7137], RunID=[testNumber2],
>>>>>ProcessingNode=[Adarshs-MacBook-Pro.local],
>>>>>WorkflowInstId=[a551fd81-5e3c-11e4-b229-73fd473a7137]}}]
>>>>>Kindly let us know since the exception is a single line and we are not
>>>>>able to figure out the source.
>>>>>Regards
>>>>>Gautham
>>>>>
>>>>>
>>>>>On Mon, Oct 27, 2014 at 4:11 PM, Gautham Gowrishankar
>>>>><[email protected]> wrote:
>>>>>
>>>>>Hi Prof,
>>>>>
>>>>>
>>>>>We were trying out the CasPGE learn by example tutorial,but after
>>>>>starting out the workflow it has taken 15 mins and only 33% has been
>>>>>completed.
>>>>>
>>>>>
>>>>>Attached is the screenshot,Kindly let us know whether we are on the
>>>>>right
>>>>>track.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>Regards
>>>>>Gautham
>>>>>
>>>>>
>>>>>On Mon, Oct 27, 2014 at 11:51 AM, Gautham Gowrishankar
>>>>><[email protected]> wrote:
>>>>>
>>>>>Hy Professor,
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>I have issues running the ./querytool  the following lines are what it
>>>>>seems to be pointing to
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>==================
>>>>>"$_RUNJAVA" $JAVA_OPTS $OODT_OPTS \
>>>>>  -Djava.endorsed.dirs=../lib \
>>>>>  org.apache.oodt.cas.filemgr.tools.QueryTool "$@"
>>>>>
>>>>>
>>>>>Any idea ? what the issue u think
>>>>>
>>>>>
>>>>>Regards
>>>>>Gautham
>>>>>
>>>>>
>>>>>
>>>>>On Mon, Oct 20, 2014 at 7:45 PM, Christian Alan Mattmann
>>>>><[email protected]> wrote:
>>>>>
>>>>>Hi Gautham,
>>>>>
>>>>>Thanks for your question - one of the main reasons is that
>>>>>you can keep track of the upstream provenance using OODT
>>>>>which may or may not aid you in your ranking computation
>>>>>later on. There are some other things (e.g., automated
>>>>>benchmarking and so forth) that OODT provides.
>>>>>
>>>>>ETLlib also has an easy poster too that I¹d like you guys
>>>>>to try using that¹s just as easy (if not more) than post.jar.
>>>>>
>>>>>Cheers,
>>>>>Chris
>>>>>
>>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>Chris Mattmann, Ph.D.
>>>>>Adjunct Associate Professor, Computer Science Department
>>>>>University of Southern California
>>>>>Los Angeles, CA 90089 USA
>>>>>Email: [email protected]
>>>>>WWW: http://sunset.usc.edu/~mattmann/
>>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>-----Original Message-----
>>>>>From: Gautham Gowrishankar <[email protected]>
>>>>>Date: Monday, October 20, 2014 at 5:07 PM
>>>>>To: Chris Mattmann <[email protected]>
>>>>>Subject: Regarding Assignment 2
>>>>>
>>>>>>Hello Professor,
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>I had a question regarding the index,
>>>>>>Why were we asked to index the JSON files using OODT and ETLib when
>>>>>>Solr
>>>>>>has the capability to perform automatic indexing  for eg using the
>>>>>>post.jar.
>>>>>>
>>>>>>
>>>>>>Regards
>>>>>>Gautham
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
>
>
>
>
>
>

Re: Regarding Assignment 2

Reply via email to