[jira] [Updated] (NUTCH-2236) Upgrade to Hadoop 2.7.1

2016-02-29 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated NUTCH-2236: Fix Version/s: 1.12 > Upgrade to Hadoop 2.

[jira] [Commented] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-02-26 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169167#comment-15169167 ] Otis Gospodnetic commented on NUTCH-2234: - +1, works for us. > Upgrade to elasticsearch 2.

[jira] [Commented] (NUTCH-1228) Change mapred.task.timeout to mapreduce.task.timeout in fetcher

2016-02-26 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169171#comment-15169171 ] Otis Gospodnetic commented on NUTCH-1228: - I think we are using this with Nutch 1.11, right

[jira] [Updated] (NUTCH-1228) Change mapred.task.timeout to mapreduce.task.timeout in fetcher

2016-02-26 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated NUTCH-1228: Fix Version/s: 1.12 > Change mapred.task.timeout to mapreduce.task.timeout in fetc

[jira] [Commented] (NUTCH-1314) Impose a limit on the length of outlink target urls

2016-02-02 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128263#comment-15128263 ] Otis Gospodnetic commented on NUTCH-1314: - We've run into this issue with Nutch 1.x and have

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2016-01-19 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108023#comment-15108023 ] Otis Gospodnetic commented on NUTCH-1325: - Median is the same as 50th percentile, isn't it? What

[jira] [Commented] (NUTCH-1233) Rely on Tika for outlink extraction

2016-01-19 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108025#comment-15108025 ] Otis Gospodnetic commented on NUTCH-1233: - My opinion: better to have this in Nutch (the issue

[jira] [Commented] (NUTCH-961) Expose Tika's boilerpipe support

2016-01-18 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106152#comment-15106152 ] Otis Gospodnetic commented on NUTCH-961: Any chance we could commit this, [~markus.jel

[jira] [Commented] (NUTCH-1687) Pick queue in Round Robin

2013-12-22 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855369#comment-13855369 ] Otis Gospodnetic commented on NUTCH-1687: - [~tiennm] - the new class should have

[jira] [Commented] (NUTCH-1314) Impose a limit on the length of outlink target urls

2013-12-20 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854056#comment-13854056 ] Otis Gospodnetic commented on NUTCH-1314: - BTW. we are using this now, too. +1

[jira] [Updated] (NUTCH-1682) Port optionally maintain custom fetch interval despite AdaptiveFetchSchedule to 2.x

2013-12-10 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated NUTCH-1682: Description: Port NUTCH-1388 to 2.x Port optionally maintain custom fetch interval

[jira] [Updated] (NUTCH-1683) Optionally maintain custom fetch interval despite AbstractFetchSchedule

2013-12-10 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated NUTCH-1683: Fix Version/s: 2.3 Optionally maintain custom fetch interval despite

[jira] [Commented] (NUTCH-1326) HostDeduplicator for Nutch

2013-12-09 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843180#comment-13843180 ] Otis Gospodnetic commented on NUTCH-1326: - Should this be for 1.x only

[jira] [Commented] (NUTCH-656) DeleteDuplicates based on crawlDB only

2013-12-08 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13842512#comment-13842512 ] Otis Gospodnetic commented on NUTCH-656: This patch was for 1.x only. We've ported

[jira] [Commented] (NUTCH-1556) enabling updatedb to accept batchId

2013-12-04 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839240#comment-13839240 ] Otis Gospodnetic commented on NUTCH-1556: - [~tiennm] it looks like you added

[jira] [Reopened] (NUTCH-1556) enabling updatedb to accept batchId

2013-12-04 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic reopened NUTCH-1556: - Reopening because this issue has a new patch that should be committed. enabling updatedb

[jira] [Updated] (NUTCH-1679) UpdateDb using batchId, link may override crawled page.

2013-12-04 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated NUTCH-1679: Priority: Critical (was: Major) UpdateDb using batchId, link may override crawled page

[jira] [Comment Edited] (NUTCH-1556) enabling updatedb to accept batchId

2013-12-04 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839242#comment-13839242 ] Otis Gospodnetic edited comment on NUTCH-1556 at 12/4/13 7:23 PM

[jira] [Resolved] (NUTCH-1556) enabling updatedb to accept batchId

2013-12-04 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved NUTCH-1556. - Resolution: Fixed Marking as Fixed again because I see the patch that was added

[jira] [Commented] (NUTCH-1667) Updatedb always ignore batchId

2013-12-01 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836174#comment-13836174 ] Otis Gospodnetic commented on NUTCH-1667: - The description mentions batchId

[jira] [Commented] (NUTCH-1672) Inlinks are added twice in DbUpdateReducer

2013-12-01 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836175#comment-13836175 ] Otis Gospodnetic commented on NUTCH-1672: - Yup, looks redundant. [~lewismc

[jira] [Commented] (NUTCH-1674) Use batchId filter to enable scan (GORA-119) for Fetch,Parse,Update,Index

2013-11-28 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13834944#comment-13834944 ] Otis Gospodnetic commented on NUTCH-1674: - [~alparslan.avci] - thanks! Did you

[jira] [Commented] (NUTCH-1297) it is better for fetchItemQueues to select items from greater queues first

2013-11-27 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13833927#comment-13833927 ] Otis Gospodnetic commented on NUTCH-1297: - bq. I think it was long as in 'has many

[jira] [Updated] (NUTCH-1674) Use batchId filter to enable scan (GORA-119) for Fetch,Parse,Update,Index

2013-11-26 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated NUTCH-1674: Summary: Use batchId filter to enable scan (GORA-119) for Fetch,Parse,Update,Index

[jira] [Commented] (NUTCH-1674) Use batchId filter enable scan (GORA-119) for Fetch,Parse,Update,Index

2013-11-26 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13833424#comment-13833424 ] Otis Gospodnetic commented on NUTCH-1674: - [~lewismc] - what do you mean

[jira] [Commented] (NUTCH-1661) Language based crawling

2013-11-26 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13833459#comment-13833459 ] Otis Gospodnetic commented on NUTCH-1661: - Can you describe this a bit more? So

[jira] [Updated] (NUTCH-1661) Language based crawling

2013-11-26 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated NUTCH-1661: Labels: PatchAvailable (was: ) Language based crawling

[jira] [Commented] (NUTCH-1297) it is better for fetchItemQueues to select items from greater queues first

2013-11-26 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13833461#comment-13833461 ] Otis Gospodnetic commented on NUTCH-1297: - I think [~behnam.nikbakht] is gone

[jira] [Commented] (NUTCH-961) Expose Tika's boilerpipe support

2013-10-08 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789686#comment-13789686 ] Otis Gospodnetic commented on NUTCH-961: Looks like [~kkrugler] is offering to help

[jira] [Commented] (NUTCH-961) Expose Tika's boilerpipe support

2013-10-08 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789894#comment-13789894 ] Otis Gospodnetic commented on NUTCH-961: bq. We don't use it BP anymore What do

[jira] [Commented] (NUTCH-1377) Add option to index via CloudSolrServer instead

2013-08-06 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731484#comment-13731484 ] Otis Gospodnetic commented on NUTCH-1377: - Is this really meant only for 1.8

[jira] [Commented] (NUTCH-945) Indexing to multiple SOLR Servers

2013-08-06 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731488#comment-13731488 ] Otis Gospodnetic commented on NUTCH-945: Should this be closed as a dupe of NUTCH

[jira] [Comment Edited] (NUTCH-945) Indexing to multiple SOLR Servers

2013-08-06 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731488#comment-13731488 ] Otis Gospodnetic edited comment on NUTCH-945 at 8/7/13 12:30 AM

GORA dependency and build failures

2011-04-08 Thread Otis Gospodnetic
Hi, Just curious - is the plan to wait for the GORA 0.1 release to get published somewhere (not familiar with Ivy, so I'm not sure where things need to get published), and then that will automatically fix the failing build? Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene -

Re: Build failed in Jenkins: Nutch-trunk #1419

2011-03-08 Thread Otis Gospodnetic
Hola, Just wondering if you guys know what's causing this? It looks like /export/home/hudson/tools/ant/latest/bin/ant is not there, but is this a known issue? Something infra guys need to fix or? Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem

[jira] Commented: (NUTCH-909) Add alternative search-provider to Nutch site

2010-09-26 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915027#action_12915027 ] Otis Gospodnetic commented on NUTCH-909: Btw. I'll be presenting what's behind

Re: Alternative search box for Nutch site

2010-08-30 Thread Otis Gospodnetic
- Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Otis Gospodnetic ogjunk-nu...@yahoo.com To: dev@nutch.apache.org Sent: Mon, August 9, 2010 4:49:18 PM Subject: Alternative search box for Nutch site Hello, (sending this to d...@nutch

Alternative search box for Nutch site

2010-08-09 Thread Otis Gospodnetic
Hello, (sending this to d...@nutch instead of old nutch-...@lucene) Over at http://search-lucene.com we index Nutch's mailing lists, wiki, web site, source code, javadoc, jira... Would the community be interested in a patch that adds another search option to the search box on