Custom options in nutch crawl script

2016-09-29 Thread Sachin Shaju
/ CrawlTest/ 3* So I would like to know why my first command didn't work ?Am I missing anything. Please help. Regards, Sachin Shaju sachi...@mstack.com +919539887554 -- The information contained in this electronic message and any attachments to this message are intended for the exclusive

Re: Nutch in production

2016-09-29 Thread Sachin Shaju
Can I have a link to this ? Regards, Sachin Shaju sachi...@mstack.com +919539887554 On Thu, Sep 29, 2016 at 11:13 PM, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > Yep also check out the work that Sujen Shah just merged (also on my team > at JPL and > USC

Re: 90% of URL rejected by filtering (Nutch 2.3.1)

2016-10-05 Thread Sachin Shaju
For the time being you can comment out this line -^.{513,}$ and check. Regards, Sachin Shaju sachi...@mstack.com +919539887554 On Wed, Oct 5, 2016 at 11:41 AM, shubham.gupta <shubham.gu...@orkash.com> wrote: > my current regex-urlfilter properties are as follows: > > #

Re: Nutch as a service

2016-10-05 Thread Sachin Shaju
expecting it to pick the latest segment automatically. But it is not working that way. The request I've used is :- *POST /job/create* *{ * *"type":"FETCH",* *"confId":"news",* *"crawlId":"crawl001",* * "args":

Re: 90% of URL rejected by filtering (Nutch 2.3.1)

2016-10-05 Thread Sachin Shaju
Hi, Can you share your current regex-urlfilter file ? Regards, Sachin Shaju sachi...@mstack.com +919539887554 On Wed, Oct 5, 2016 at 11:19 AM, shubham.gupta <shubham.gu...@orkash.com> wrote: > The problem is not yet solved. > > Thanks and Regards > Shubham Gupta > &

Nutch as a service

2016-10-04 Thread Sachin Shaju
Hi, I would like to know how nutch server works actually? Whether it use a listener for incoming crawl requests or it is a continuously running server? Regards, Sachin Shaju sachi...@mstack.com -- The information contained in this electronic message and any attachments to this message

Unknown issue in Nutch indexer with REST api

2016-10-07 Thread Sachin Shaju
:237) Failed with exit code 255. Any help would be appreciated. PS : After debugging using stack trace I think the issue is due to mismatch in guava version. I've tried changing build.xml of plugins(parse-tika and parsefilter-naivebayes) but it didn't work. Regards, Sachin Shaju sachi

Re: Nutch as a service

2016-10-07 Thread Sachin Shaju
. Everything works until index phase. Indexing to elasticsearch is failing by throwing an unknown exception. Please have a look at http://www.mail-archive.com/user%40nutch.apache.org/msg15001.html Regards, Sachin Shaju sachi...@mstack.com On Thu, Oct 6, 2016 at 10:12 PM, Furkan KAMACI <furkan

How to run nutch server on distributed environment

2016-09-29 Thread Sachin Shaju
. Regards, Sachin Shaju sachi...@mstack.com +919539887554 -- The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you

Nutch in production

2016-09-29 Thread Sachin Shaju
as a continuously running distributed server by any other option ? My preferred nutch version is nutch 1.12. Regards, Sachin Shaju sachi...@mstack.com +919539887554 -- The information contained in this electronic message and any attachments to this message are intended for the exclusive use

Re: Custom elastic indexer in nutch

2016-11-06 Thread Sachin Shaju
How to do the same with index.parse.md ? Any useful links or demonstration please. Regards, Sachin Shaju sachi...@mstack.com +919539887554 On Sat, Nov 5, 2016 at 8:49 PM, MrSrivastavaRK . <srivastav...@gmail.com> wrote: > I am facing same problem. Thought of to share some work around,

Re: Custom elastic indexer in nutch

2016-11-07 Thread Sachin Shaju
One elaborated answer to the same : http://stackoverflow.com/questions/40418712/adding-custom-fields-and-types-in-nutch-elastic-indexer/40423485#40423485 Regards, Sachin Shaju sachi...@mstack.com On Fri, Nov 4, 2016 at 2:35 PM, Sachin Shaju <sachi...@mstack.com> wrote: > Hi, &