Re: Nutch Cosine Filter

2016-12-02 Thread Sujen Shah
Hi Thank you for your feedback! Appreciate it. Currently, there are no tools apart from the ones you have already experimented with (topN and generate.min.score) to direct the crawl towards the top scoring urls. I wonder why did the generate.min.score did not work. I looked in to the code and

[jira] [Resolved] (NUTCH-2327) Seeds injected in REST workflow must be ingested into HDFS

2016-10-25 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah resolved NUTCH-2327. --- Resolution: Fixed > Seeds injected in REST workflow must be ingested into H

[jira] [Created] (NUTCH-2331) REST API Fetch fails to retrieve HDFS path on distributed mode

2016-10-20 Thread Sujen Shah (JIRA)
Sujen Shah created NUTCH-2331: - Summary: REST API Fetch fails to retrieve HDFS path on distributed mode Key: NUTCH-2331 URL: https://issues.apache.org/jira/browse/NUTCH-2331 Project: Nutch

[jira] [Work started] (NUTCH-2327) Seeds injected in REST workflow must be ingested into HDFS

2016-10-20 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2327 started by Sujen Shah. - > Seeds injected in REST workflow must be ingested into H

[jira] [Commented] (NUTCH-2327) Seeds injected in REST workflow must be ingested into HDFS

2016-10-18 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584663#comment-15584663 ] Sujen Shah commented on NUTCH-2327: --- Thanks [~lewismc], I have already started working on this issue, I

[jira] [Created] (NUTCH-2326) Implement InvertLinks job in webui package

2016-10-17 Thread Sujen Shah (JIRA)
Sujen Shah created NUTCH-2326: - Summary: Implement InvertLinks job in webui package Key: NUTCH-2326 URL: https://issues.apache.org/jira/browse/NUTCH-2326 Project: Nutch Issue Type: Task

Re: Plugin dependancies do not get added to classpath while running Nutch in local mode

2016-09-25 Thread Sujen Shah
ive or absolute path. If absolute, it is used > as is. If relative, it is searched for on the classpath. > > > See also my comments on https://github.com/apache/nutch/pull/152 > > Sebastian > > > On 09/23/2016 12:06 AM, Sujen Shah wrote: > > Thank you Sebastian for your

[jira] [Updated] (NUTCH-2317) Plugin jars don't get added to classpath while running in local

2016-09-24 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah updated NUTCH-2317: -- Description: Currently, plugin dependencies listed in the plugin's ivy.xml don't get added

[jira] [Commented] (NUTCH-2317) Plugin jars don't get added to classpath while running in local

2016-09-24 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15519326#comment-15519326 ] Sujen Shah commented on NUTCH-2317: --- [~lewismc], please have a look at this. Thanks > Plugin jars do

[jira] [Created] (NUTCH-2317) Plugin jars don't get added to classpath while running in local

2016-09-24 Thread Sujen Shah (JIRA)
Sujen Shah created NUTCH-2317: - Summary: Plugin jars don't get added to classpath while running in local Key: NUTCH-2317 URL: https://issues.apache.org/jira/browse/NUTCH-2317 Project: Nutch

Re: Plugin dependancies do not get added to classpath while running Nutch in local mode

2016-09-22 Thread Sujen Shah
h/blob/master/src/plugin/ > parse-tika/plugin.xml > > This double work is not ideal and a frequent cause for errors but that's > how it works right now. > > Cheers, > Sebastian > > > On 09/12/2016 11:56 PM, Sujen Shah wrote: > > Hi Devs, > > > > I am facin

Plugin dependancies do not get added to classpath while running Nutch in local mode

2016-09-12 Thread Sujen Shah
to modify the root ivy.xml for plugin specific dependencies. I wanted to ask the devs first if there was already a solution before filing a JIRA issue. If not, I'll submit it through JIRA. Thank you for your help. Regards, Sujen Shah plugin-dependency.patch Description: Binary data

[jira] [Resolved] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events

2016-09-07 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah resolved NUTCH-2132. --- Resolution: Fixed > Publisher/Subscriber model for Nutch to emit eve

[jira] [Resolved] (NUTCH-2246) Refactor /seed endpoint for backward compatibility

2016-08-22 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah resolved NUTCH-2246. --- Resolution: Fixed > Refactor /seed endpoint for backward compatibil

[jira] [Commented] (NUTCH-2246) Refactor /seed endpoint for backward compatibility

2016-08-22 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431657#comment-15431657 ] Sujen Shah commented on NUTCH-2246: --- Thanks [~wastl-nagel], will follow that from now

[jira] [Work started] (NUTCH-2246) Refactor /seed endpoint for backward compatibility

2016-08-02 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2246 started by Sujen Shah. - > Refactor /seed endpoint for backward compatibil

[jira] [Commented] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events

2016-08-02 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404613#comment-15404613 ] Sujen Shah commented on NUTCH-2132: --- Updated PR and cleaned the commit log. One issue I have not been

[jira] [Commented] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events

2016-08-02 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404523#comment-15404523 ] Sujen Shah commented on NUTCH-2132: --- Okay, got it working. Turns out I had forgotten to add

[jira] [Comment Edited] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events

2016-08-02 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404414#comment-15404414 ] Sujen Shah edited comment on NUTCH-2132 at 8/2/16 5:21 PM: --- It does not throw

[jira] [Commented] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events

2016-08-02 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404414#comment-15404414 ] Sujen Shah commented on NUTCH-2132: --- It does not throw any exceptions, but when I check the number

[jira] [Commented] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events

2016-08-02 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404198#comment-15404198 ] Sujen Shah commented on NUTCH-2132: --- Hi Everyone, I have created an initial PR for this https

[jira] [Commented] (NUTCH-2246) Refactor /seed endpoint for backward compatibility

2016-08-01 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402494#comment-15402494 ] Sujen Shah commented on NUTCH-2246: --- Linking to the PR - https://github.com/apache/nutch/pull/137

[jira] [Resolved] (NUTCH-2245) Developed the NGram Model on the existing Unigram Cosine Similarity Model

2016-04-26 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah resolved NUTCH-2245. --- Resolution: Fixed > Developed the NGram Model on the existing Unigram Cosine Similarity Mo

[jira] [Updated] (NUTCH-2249) WordNet Integration for Cosine Similarity

2016-04-13 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah updated NUTCH-2249: -- Fix Version/s: 1.12 > WordNet Integration for Cosine Similar

[jira] [Updated] (NUTCH-2249) WordNet Integration for Cosine Similarity

2016-04-13 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah updated NUTCH-2249: -- Labels: memex (was: ) > WordNet Integration for Cosine Similar

[jira] [Assigned] (NUTCH-2249) WordNet Integration for Cosine Similarity

2016-04-13 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah reassigned NUTCH-2249: - Assignee: Sujen Shah > WordNet Integration for Cosine Similar

Recent stackoverflow questions

2016-04-05 Thread Sujen Shah
Hey Devs, Just bringing your attention to recent questions being asked on stackoverflow regarding using Nutch as a service and regarding how to enable/store cookies. http://stackoverflow.com/questions/36425447/can-nutch-be-deployed-to-crawl-specific-pages

[jira] [Updated] (NUTCH-2245) Developed the NGram Model on the existing Unigram Cosine Similarity Model

2016-04-02 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah updated NUTCH-2245: -- Fix Version/s: 1.12 > Developed the NGram Model on the existing Unigram Cosine Similarity Mo

[jira] [Work started] (NUTCH-2245) Developed the NGram Model on the existing Unigram Cosine Similarity Model

2016-04-02 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2245 started by Sujen Shah. - > Developed the NGram Model on the existing Unigram Cosine Similarity Mo

Re: Reg. License of Princeton WordNet

2016-04-01 Thread Sujen Shah
Hi Bhavya, Could you provide links to the libraries you are trying to leverage. Thanks! On Apr 1, 2016 4:08 PM, "Bhavya Sanghavi" wrote: > Hi, > > I am planning to integrate WordNet in the Scoring Similarity plugin of > Nutch. I just wanted to confirm that there is

[jira] [Updated] (NUTCH-2245) Developed the NGram Model on the existing Unigram Cosine Similarity Model

2016-03-30 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah updated NUTCH-2245: -- Labels: memex (was: ) > Developed the NGram Model on the existing Unigram Cosine Similarity Mo

[jira] [Assigned] (NUTCH-2245) Developed the NGram Model on the existing Unigram Cosine Similarity Model

2016-03-30 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah reassigned NUTCH-2245: - Assignee: Sujen Shah > Developed the NGram Model on the existing Unigram Cosine Similarity Mo

Re: [selenium] running selenium headless

2016-03-28 Thread Sujen Shah
Hi Can't get much info from the log you have pasted. Some Qs: Which browser are you using ? Have you tried running the browser alone on the server before running nutch ? Could you please attach the detailed logs from hadoop.log file ? Thanks. Regards, Sujen Shah M.S - Computer Science

[jira] [Commented] (NUTCH-2209) Improved Tokenization for Similarity Scoring plugin

2016-02-11 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143366#comment-15143366 ] Sujen Shah commented on NUTCH-2209: --- There seems to be an issue with the svn-git synchronization

[jira] [Comment Edited] (NUTCH-2209) Improved Tokenization for Similarity Scoring plugin

2016-02-11 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143366#comment-15143366 ] Sujen Shah edited comment on NUTCH-2209 at 2/11/16 7:54 PM: There seems

[jira] [Updated] (NUTCH-2206) Provide example scoring.similarity.stopword.file

2016-01-26 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah updated NUTCH-2206: -- Attachment: NUTCH-2206.patch Hey [~lewismc], here's the patch providing an example for the stopword

[jira] [Commented] (NUTCH-2206) Provide example scoring.similarity.stopword.file

2016-01-26 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117810#comment-15117810 ] Sujen Shah commented on NUTCH-2206: --- Ohh yes, will do it now, missed it in the patch. > Prov

[jira] [Updated] (NUTCH-2206) Provide example scoring.similarity.stopword.file

2016-01-26 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah updated NUTCH-2206: -- Attachment: NUTCH-2206.patch Added example for the property in nutch-default.xml > Provide exam

Re: [VOTE] Moving to Git

2016-01-08 Thread Sujen Shah
+1 Regards, Sujen Shah M.S - Computer Science (Class of 2016) University of Southern California http://www.linkedin.com/in/sujenshah On Fri, Jan 8, 2016 at 2:58 PM, Julien Nioche <lists.digitalpeb...@gmail.com > wrote: > +1 to move to Git > > Note : I don't think Dennis is on

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059472#comment-15059472 ] Sujen Shah commented on NUTCH-2184: --- This patch requires the segment directory to have all its folders

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059471#comment-15059471 ] Sujen Shah commented on NUTCH-2184: --- This patch requires the segment directory to have all its folders

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059477#comment-15059477 ] Sujen Shah commented on NUTCH-2184: --- This patch requires the segment directory to have all its folders

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059475#comment-15059475 ] Sujen Shah commented on NUTCH-2184: --- This patch requires the segment directory to have all its folders

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059479#comment-15059479 ] Sujen Shah commented on NUTCH-2184: --- This patch requires the segment directory to have all its folders

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059473#comment-15059473 ] Sujen Shah commented on NUTCH-2184: --- This patch requires the segment directory to have all its folders

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059474#comment-15059474 ] Sujen Shah commented on NUTCH-2184: --- This patch requires the segment directory to have all its folders

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059476#comment-15059476 ] Sujen Shah commented on NUTCH-2184: --- This patch requires the segment directory to have all its folders

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059478#comment-15059478 ] Sujen Shah commented on NUTCH-2184: --- This patch requires the segment directory to have all its folders

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059417#comment-15059417 ] Sujen Shah commented on NUTCH-2184: --- Thanks [~lewismc] > Enable IndexingJob to function with no craw

Re: [MASSMAIL][VOTE] Release Apache Nutch 1.11 RC#2

2015-12-04 Thread Sujen Shah
+1 Regards, Sujen Shah M.S - Computer Science (Class of 2016) University of Southern California http://www.linkedin.com/in/sujenshah On Fri, Dec 4, 2015 at 10:20 AM, Roannel Fernández Hernández <roan...@uci.cu > wrote: > +1 > > Regards > > --

[jira] [Updated] (NUTCH-2154) Nutch REST API (DB) suffering NullPointerException

2015-10-30 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah updated NUTCH-2154: -- Attachment: NUTCH-2154.patch This patch addresses the Null pointer exceptions and other nulls which

[jira] [Commented] (NUTCH-2152) CommonCrawl dump via Service endpoint

2015-10-29 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981123#comment-14981123 ] Sujen Shah commented on NUTCH-2152: --- So taking this scenario when a dump is called before fetch

[jira] [Commented] (NUTCH-2152) CommonCrawl dump via Service endpoint

2015-10-29 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981137#comment-14981137 ] Sujen Shah commented on NUTCH-2152: --- Ahh yes, the Commoncrawldumper tool (used from the command line

[jira] [Commented] (NUTCH-2152) CommonCrawl dump via Service endpoint

2015-10-29 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981139#comment-14981139 ] Sujen Shah commented on NUTCH-2152: --- Will verify this at the REST end. > CommonCrawl dump via Serv

[jira] [Commented] (NUTCH-2152) CommonCrawl dump via Service endpoint

2015-10-29 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981120#comment-14981120 ] Sujen Shah commented on NUTCH-2152: --- Could you tell me if that job failed, or the Nutch server crashed

[jira] [Commented] (NUTCH-1800) Documentation for Nutch 1.X and 2.X REST APIs

2015-10-29 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981221#comment-14981221 ] Sujen Shah commented on NUTCH-1800: --- Just saw the patch. Have not tried it, but I need to write new test

[jira] [Commented] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events

2015-10-29 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981204#comment-14981204 ] Sujen Shah commented on NUTCH-2132: --- Yes, the errors are logged. > Publisher/Subscriber mo

[jira] [Updated] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events

2015-10-29 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah updated NUTCH-2132: -- Attachment: NUTCH-2132.v2.patch Hi Folks, Just found a few more cases where exceptions are thrown

[jira] [Commented] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events

2015-10-28 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978911#comment-14978911 ] Sujen Shah commented on NUTCH-2132: --- [~ahmadia], bq. One issue I'm having is that if I start a Nutch

[jira] [Commented] (NUTCH-2153) Nutch REST API (DB) uses POST instead of GET to request

2015-10-28 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978826#comment-14978826 ] Sujen Shah commented on NUTCH-2153: --- Hi [~ahmadia] and [~chrismattmann], Currently, while using Nutch

[jira] [Commented] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events

2015-10-28 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978942#comment-14978942 ] Sujen Shah commented on NUTCH-2132: --- Yes the first patch does not have that property

[jira] [Commented] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events

2015-10-28 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978952#comment-14978952 ] Sujen Shah commented on NUTCH-2132: --- Yes this is taken care of in the second patch. And, apply

[jira] [Commented] (NUTCH-1800) Documentation for Nutch 1.X and 2.X REST APIs

2015-10-28 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979830#comment-14979830 ] Sujen Shah commented on NUTCH-1800: --- Thanks Lewis for this, it is going to be really helpful

[jira] [Created] (NUTCH-2156) Dump via Services end point

2015-10-28 Thread Sujen Shah (JIRA)
Sujen Shah created NUTCH-2156: - Summary: Dump via Services end point Key: NUTCH-2156 URL: https://issues.apache.org/jira/browse/NUTCH-2156 Project: Nutch Issue Type: Sub-task

[jira] [Created] (NUTCH-2151) Service endpoint for REST API

2015-10-27 Thread Sujen Shah (JIRA)
Sujen Shah created NUTCH-2151: - Summary: Service endpoint for REST API Key: NUTCH-2151 URL: https://issues.apache.org/jira/browse/NUTCH-2151 Project: Nutch Issue Type: Sub-task

[jira] [Assigned] (NUTCH-2152) CommonCrawl dump via Service endpoint

2015-10-27 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah reassigned NUTCH-2152: - Assignee: Sujen Shah > CommonCrawl dump via Service endpo

[jira] [Resolved] (NUTCH-2128) Refactor configuration end point

2015-10-27 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah resolved NUTCH-2128. --- Resolution: Fixed > Refactor configuration end po

[jira] [Assigned] (NUTCH-2070) Parameterize Fetch REST Endpoint

2015-10-27 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah reassigned NUTCH-2070: - Assignee: Sujen Shah > Parameterize Fetch REST Endpo

[jira] [Work started] (NUTCH-2152) CommonCrawl dump via Service endpoint

2015-10-27 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2152 started by Sujen Shah. - > CommonCrawl dump via Service endpo

[jira] [Updated] (NUTCH-2151) Service endpoint for REST API

2015-10-27 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah updated NUTCH-2151: -- Issue Type: New Feature (was: Sub-task) Parent: (was: NUTCH-1931) > Service endpo

[jira] [Closed] (NUTCH-2070) Parameterize Fetch REST Endpoint

2015-10-27 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah closed NUTCH-2070. - Resolution: Fixed Implemented as a part of https://issues.apache.org/jira/browse/NUTCH-2099

[jira] [Created] (NUTCH-2152) CommonCrawl dump via Service endpoint

2015-10-27 Thread Sujen Shah (JIRA)
Sujen Shah created NUTCH-2152: - Summary: CommonCrawl dump via Service endpoint Key: NUTCH-2152 URL: https://issues.apache.org/jira/browse/NUTCH-2152 Project: Nutch Issue Type: Sub-task

[jira] [Updated] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events

2015-10-27 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah updated NUTCH-2132: -- Attachment: PubSub_routingkey.patch Patch to route different crawls with different routingkeys set

[jira] [Updated] (NUTCH-2152) CommonCrawl dump via Service endpoint

2015-10-27 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah updated NUTCH-2152: -- Attachment: NUTCH-2152.git.patch Here is the first iteration of the patch. The commoncrawl dump via

[jira] [Resolved] (NUTCH-2149) REST endpoint to read Nutch sequence files

2015-10-25 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah resolved NUTCH-2149. --- Resolution: Fixed > REST endpoint to read Nutch sequence fi

[jira] [Assigned] (NUTCH-2149) REST endpoint to read Nutch sequence files

2015-10-25 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah reassigned NUTCH-2149: - Assignee: Sujen Shah > REST endpoint to read Nutch sequence fi

[jira] [Commented] (NUTCH-2149) REST endpoint to read Nutch sequence files

2015-10-25 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973360#comment-14973360 ] Sujen Shah commented on NUTCH-2149: --- Committed 1710468 > REST endpoint to read Nutch sequence fi

[jira] [Commented] (NUTCH-2149) REST endpoint to read Nutch sequence files

2015-10-25 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973372#comment-14973372 ] Sujen Shah commented on NUTCH-2149: --- Ohh I didn't know that, will do that from now on. Thanks

[jira] [Updated] (NUTCH-2149) REST endpoint to read Nutch sequence files

2015-10-23 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah updated NUTCH-2149: -- Description: This endpoint enables reading of the webgraph data like nodes, links and any other

[jira] [Created] (NUTCH-2149) REST endpoint to read Nutch sequence files

2015-10-23 Thread Sujen Shah (JIRA)
Sujen Shah created NUTCH-2149: - Summary: REST endpoint to read Nutch sequence files Key: NUTCH-2149 URL: https://issues.apache.org/jira/browse/NUTCH-2149 Project: Nutch Issue Type: New Feature

[jira] [Created] (NUTCH-2135) Ant Eclipse build does not include protocol-interactiveselenium

2015-10-09 Thread Sujen Shah (JIRA)
Sujen Shah created NUTCH-2135: - Summary: Ant Eclipse build does not include protocol-interactiveselenium Key: NUTCH-2135 URL: https://issues.apache.org/jira/browse/NUTCH-2135 Project: Nutch

Re: Team 18 : Similarity scoring: goldstandard.txt, stopwords.txt contents

2015-10-07 Thread Sujen Shah
with an example soon. Best, Sujen Regards, Sujen Shah M.S - Computer Science (Class of 2016) University of Southern California http://www.linkedin.com/in/sujenshah On Wed, Oct 7, 2015 at 6:52 AM, Christian Alan Mattmann <mattm...@usc.edu> wrote: > Sujen can you provide an example on the existin

Re: Request for inclusion in the Nutch email list

2015-10-02 Thread Sujen Shah
Hi Pramod, To subscribe to the list you need to send a mail to dev-subscr...@nutch.apache.org. For more instructions have a look at - http://nutch.apache.org/mailing_lists.html Cheers, Sujen Shah On Tue, Sep 29, 2015 at 10:22 PM, Pramod Nagarajarao <pramo...@usc.edu> wrote: > H

[jira] [Updated] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events

2015-10-02 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah updated NUTCH-2132: -- Attachment: NUTCH-2132.patch Attaching a patch which describes my idea for a Pub/Sub model

[jira] [Created] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events

2015-10-02 Thread Sujen Shah (JIRA)
Sujen Shah created NUTCH-2132: - Summary: Publisher/Subscriber model for Nutch to emit events Key: NUTCH-2132 URL: https://issues.apache.org/jira/browse/NUTCH-2132 Project: Nutch Issue Type: New

[jira] [Commented] (NUTCH-2011) Endpoint to support realtime JSON output from the fetcher

2015-10-02 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940925#comment-14940925 ] Sujen Shah commented on NUTCH-2011: --- Hi [~wastl-nagel], [~chrismattmann] and [~ahmadia], Please have

[jira] [Updated] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events

2015-10-02 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah updated NUTCH-2132: -- Description: It would be nice to have a Pub/Sub model in Nutch to emit certain events (ex- Fetcher

[jira] [Assigned] (NUTCH-2128) Refactor configuration end point

2015-10-01 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah reassigned NUTCH-2128: - Assignee: Sujen Shah > Refactor configuration end po

[jira] [Updated] (NUTCH-2123) Seed List REST API returns Text but headers indicate/require JSON

2015-10-01 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah updated NUTCH-2123: -- Attachment: NUTCH-2123.patch Patch for correcting the response headers. > Seed List REST API retu

SVN-GIT mirror not updated for Revision 1705744

2015-09-30 Thread Sujen Shah
Hi All, The recent commit with the Nutch 1x webui is not mirrored on the github repository, the commit exists in svn trunk. Have filed an INFRA ticket - https://issues.apache.org/jira/browse/INFRA-10515. Regards, Sujen Shah M.S - Computer Science (Class of 2016) University of Southern

[jira] [Updated] (NUTCH-1966) Configuration endpoint for 1x REST API

2015-09-29 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah updated NUTCH-1966: -- Summary: Configuration endpoint for 1x REST API (was: Configuration endpoint for 1x REST API [A sub

[jira] [Created] (NUTCH-2128) Refactor configuration end point

2015-09-29 Thread Sujen Shah (JIRA)
Sujen Shah created NUTCH-2128: - Summary: Refactor configuration end point Key: NUTCH-2128 URL: https://issues.apache.org/jira/browse/NUTCH-2128 Project: Nutch Issue Type: Sub-task

[jira] [Resolved] (NUTCH-2121) Update javadoc link for Hadoop 2.4.0 in default.properties

2015-09-24 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah resolved NUTCH-2121. --- Resolution: Fixed Committed to trunk (1705205) > Update javadoc link for Hadoop 2.

[jira] [Created] (NUTCH-2119) Eclipse shows build path errors on building Nutch

2015-09-24 Thread Sujen Shah (JIRA)
Sujen Shah created NUTCH-2119: - Summary: Eclipse shows build path errors on building Nutch Key: NUTCH-2119 URL: https://issues.apache.org/jira/browse/NUTCH-2119 Project: Nutch Issue Type: Bug

[jira] [Commented] (NUTCH-2119) Eclipse shows build path errors on building Nutch

2015-09-24 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907326#comment-14907326 ] Sujen Shah commented on NUTCH-2119: --- Committed to trunk (1705203) > Eclipse shows build path err

[jira] [Resolved] (NUTCH-2119) Eclipse shows build path errors on building Nutch

2015-09-24 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah resolved NUTCH-2119. --- Resolution: Fixed > Eclipse shows build path errors on building Nu

[jira] [Created] (NUTCH-2121) Update javadoc link for Hadoop 2.4.0 in default.properties

2015-09-24 Thread Sujen Shah (JIRA)
Sujen Shah created NUTCH-2121: - Summary: Update javadoc link for Hadoop 2.4.0 in default.properties Key: NUTCH-2121 URL: https://issues.apache.org/jira/browse/NUTCH-2121 Project: Nutch Issue

[jira] [Commented] (NUTCH-2011) Endpoint to support realtime JSON output from the fetcher

2015-09-18 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876639#comment-14876639 ] Sujen Shah commented on NUTCH-2011: --- Hi [~ahmadia], There is an implementation

Re: [ANNOUNCE] New Nutch committer and PMC - Sujen Shah

2015-09-15 Thread Sujen Shah
and Lewis McGibbney at school and at NASA JPL for quite some time now and have been involved in developing the Nutch REST services, focusing capabilities and currently working on the Apache Wicket based Web UI. Looking forward to get engaged with the community even more :) Thanks, Sujen Shah M.S

[jira] [Created] (NUTCH-2099) Refactoring the REST endpoints for integration with webui

2015-09-15 Thread Sujen Shah (JIRA)
Sujen Shah created NUTCH-2099: - Summary: Refactoring the REST endpoints for integration with webui Key: NUTCH-2099 URL: https://issues.apache.org/jira/browse/NUTCH-2099 Project: Nutch Issue Type

  1   2   >