[
https://issues.apache.org/jira/browse/NUTCH-563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
julien nioche updated NUTCH-563:
Attachment: diff.BasicQueryFilter.dynamicFields.txt
Include custom fields in BasicQueryFilter
Include custom fields in BasicQueryFilter
-
Key: NUTCH-563
URL: https://issues.apache.org/jira/browse/NUTCH-563
Project: Nutch
Issue Type: New Feature
Components: searcher
[
https://issues.apache.org/jira/browse/NUTCH-563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532034
]
julien nioche commented on NUTCH-563:
-
As I explained in my message to the dev-list, having a separate plugin for
Injecting Crawl metadata
Key: NUTCH-655
URL: https://issues.apache.org/jira/browse/NUTCH-655
Project: Nutch
Issue Type: Improvement
Components: injector
Reporter: julien nioche
DeleteDuplicates based on crawlDB only
---
Key: NUTCH-656
URL: https://issues.apache.org/jira/browse/NUTCH-656
Project: Nutch
Issue Type: Wish
Components: indexer
Reporter: julien
[
https://issues.apache.org/jira/browse/NUTCH-656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
julien nioche reopened NUTCH-656:
-
I suppose that the SOLR dedup mechanism is valid on a single instance. If the
documents are
[
https://issues.apache.org/jira/browse/NUTCH-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12651406#action_12651406
]
julien nioche commented on NUTCH-658:
-
Hi Dogacan,
I am off work for several weeks and
[
https://issues.apache.org/jira/browse/NUTCH-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
julien nioche updated NUTCH-658:
Attachment: ReporterCounter.patch
Hi,
I eventually managed to make the change. The new patch
Hadoop 0.19 requires an update of jets3t
Key: NUTCH-678
URL: https://issues.apache.org/jira/browse/NUTCH-678
Project: Nutch
Issue Type: Bug
Affects Versions: 1.0.0
Reporter: julien
Fetcher2 implementing Tool
--
Key: NUTCH-679
URL: https://issues.apache.org/jira/browse/NUTCH-679
Project: Nutch
Issue Type: Improvement
Components: fetcher
Reporter: julien nioche
[
https://issues.apache.org/jira/browse/NUTCH-679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
julien nioche updated NUTCH-679:
Attachment: Fetcher2.Tool.patch
Patch which makes Fetcher2 implement Tool interface
Fetcher2
[
https://issues.apache.org/jira/browse/NUTCH-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12665126#action_12665126
]
julien nioche commented on NUTCH-678:
-
I confirm. Upgrading to 0.6.1 fixed the problem
[
https://issues.apache.org/jira/browse/NUTCH-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12665791#action_12665791
]
julien nioche commented on NUTCH-679:
-
I can send a modified version of it once Todd has
SOLR indexer does not set boost on the document
---
Key: NUTCH-682
URL: https://issues.apache.org/jira/browse/NUTCH-682
Project: Nutch
Issue Type: Bug
Components: injector
Affects
[
https://issues.apache.org/jira/browse/NUTCH-656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
julien nioche closed NUTCH-656.
---
Resolution: Duplicate
DeleteDuplicates based on crawlDB only
[
https://issues.apache.org/jira/browse/NUTCH-563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
julien nioche updated NUTCH-563:
Attachment: NUTCH-563.patch
Updated the original patch + added class level javadoc comment +
[
https://issues.apache.org/jira/browse/NUTCH-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12672994#action_12672994
]
julien nioche commented on NUTCH-668:
-
at line 173 - shouldn't we return 'url' instead
AlreadyBeingCreatedException with Hadoop 0.19
-
Key: NUTCH-692
URL: https://issues.apache.org/jira/browse/NUTCH-692
Project: Nutch
Issue Type: Bug
Affects Versions: 1.0.0
Reporter:
[
https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12674607#action_12674607
]
julien nioche commented on NUTCH-692:
-
I have seen this only in multinode setup and on
Timeout for Parser
--
Key: NUTCH-696
URL: https://issues.apache.org/jira/browse/NUTCH-696
Project: Nutch
Issue Type: Wish
Components: fetcher
Reporter: julien nioche
Priority: Minor
I
Neko1.9.11 goes into a loop
---
Key: NUTCH-700
URL: https://issues.apache.org/jira/browse/NUTCH-700
Project: Nutch
Issue Type: Bug
Affects Versions: 1.0.0
Reporter: julien nioche
Priority:
[
https://issues.apache.org/jira/browse/NUTCH-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12675335#action_12675335
]
julien nioche commented on NUTCH-700:
-
Reported to CyberNeko
[
https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12675518#action_12675518
]
julien nioche commented on NUTCH-692:
-
I have been investigating this a bit more. Same
Lazy Instanciation of Metadata in CrawlDatum
Key: NUTCH-702
URL: https://issues.apache.org/jira/browse/NUTCH-702
Project: Nutch
Issue Type: Improvement
Affects Versions: 1.0.0
[
https://issues.apache.org/jira/browse/NUTCH-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
julien nioche updated NUTCH-702:
Attachment: lazyMetadataInstanciation.patch
patch for lazy instanciation of metadata in crawldatum
[
https://issues.apache.org/jira/browse/NUTCH-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12676658#action_12676658
]
julien nioche commented on NUTCH-696:
-
I was thinking along the lines of your first
[
https://issues.apache.org/jira/browse/NUTCH-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
julien nioche updated NUTCH-702:
Attachment: (was: lazyMetadataInstanciation.patch)
Lazy Instanciation of Metadata in
[
https://issues.apache.org/jira/browse/NUTCH-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
julien nioche updated NUTCH-702:
Attachment: NUTCH-702.patch
patch for lazy instanciation of metadata in crawldatum (replaces
[
https://issues.apache.org/jira/browse/NUTCH-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12678314#action_12678314
]
Julien Nioche commented on NUTCH-709:
-
do you know the URL of the document causing this
[
https://issues.apache.org/jira/browse/NUTCH-709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-709:
Attachment: JSParseFilter.error.patch
This patch catches errors in the walk method of JSParser and
[
https://issues.apache.org/jira/browse/NUTCH-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12678325#action_12678325
]
Julien Nioche commented on NUTCH-709:
-
the patch above does not fix the issue but
[
https://issues.apache.org/jira/browse/NUTCH-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12679549#action_12679549
]
Julien Nioche commented on NUTCH-709:
-
Hi Tim,
did you have a look at the logs to see
ParseOutputFormat should catch java.net.MalformedURLException coming from
normalizers
-
Key: NUTCH-712
URL: https://issues.apache.org/jira/browse/NUTCH-712
Project:
[
https://issues.apache.org/jira/browse/NUTCH-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-712:
Attachment: ParseOutputFormat-NUTCH712.patch
ParseOutputFormat should catch
[
https://issues.apache.org/jira/browse/NUTCH-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-712:
Attachment: (was: ParseOutputFormat-NUTCH712.patch)
ParseOutputFormat should catch
[
https://issues.apache.org/jira/browse/NUTCH-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-712:
Attachment: ParseOutputFormat-NUTCH712v2.patch
Modified version of the patch : if normalizers
[
https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12694942#action_12694942
]
Julien Nioche commented on NUTCH-692:
-
As I pointed out in my previous message the root
[
https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12695346#action_12695346
]
Julien Nioche commented on NUTCH-692:
-
setting mapred.task.timeout to a small value
[
https://issues.apache.org/jira/browse/NUTCH-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12695394#action_12695394
]
Julien Nioche commented on NUTCH-721:
-
The message about the Aborted hung threads looks
[
https://issues.apache.org/jira/browse/NUTCH-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-731:
Attachment: NUTCH-731.patch
Redirection of robots.txt in RobotRulesParser
[
https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696958#action_12696958
]
Julien Nioche commented on NUTCH-692:
-
I haven't had the time to try it on the SVN
[
https://issues.apache.org/jira/browse/NUTCH-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12702326#action_12702326
]
Julien Nioche commented on NUTCH-477:
-
Having a scope for the URL filters could be
[
https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12702412#action_12702412
]
Julien Nioche commented on NUTCH-692:
-
OK I had the same problem again on my main
[
https://issues.apache.org/jira/browse/NUTCH-731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712907#action_12712907
]
Julien Nioche commented on NUTCH-731:
-
I don't have a specific example now, in all the
[
https://issues.apache.org/jira/browse/NUTCH-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-702:
Attachment: NUTCH-702.patch.v2
Fixed bug reported by Dmitry Lihachev
Lazy Instanciation of
[
https://issues.apache.org/jira/browse/NUTCH-731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12722214#action_12722214
]
Julien Nioche commented on NUTCH-731:
-
Here is an example which the patch helps
[
https://issues.apache.org/jira/browse/NUTCH-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741082#action_12741082
]
Julien Nioche commented on NUTCH-721:
-
I had another look at this issue after applying
[
https://issues.apache.org/jira/browse/NUTCH-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-721:
Attachment: NUTCH-721.patch
Sets the default value for fetcher.threads.per.host.by.ip to false
[
https://issues.apache.org/jira/browse/NUTCH-679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-679:
Attachment: NUTCH-679.patch
Updated version of the patch
Fetcher2 implementing Tool
[
https://issues.apache.org/jira/browse/NUTCH-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-696.
---
Resolution: Later
Timeout for Parser
--
Key: NUTCH-696
[
https://issues.apache.org/jira/browse/NUTCH-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748840#action_12748840
]
Julien Nioche commented on NUTCH-702:
-
There have been quite a few related questions on
[
https://issues.apache.org/jira/browse/NUTCH-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748841#action_12748841
]
Julien Nioche commented on NUTCH-702:
-
of course it was meant to be
stats:
original :
Upgrade version of HttpClient
--
Key: NUTCH-751
URL: https://issues.apache.org/jira/browse/NUTCH-751
Project: Nutch
Issue Type: Improvement
Components: fetcher
Reporter: Julien Nioche
The
Prevent new Fetcher to retrieve the robots twice
Key: NUTCH-753
URL: https://issues.apache.org/jira/browse/NUTCH-753
Project: Nutch
Issue Type: Improvement
Components: fetcher
[
https://issues.apache.org/jira/browse/NUTCH-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-753:
Attachment: NUTCH-753.patch
Patch which prevents fetching the robots file twice with the new
[
https://issues.apache.org/jira/browse/NUTCH-751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753175#action_12753175
]
Julien Nioche commented on NUTCH-751:
-
Thanks for the pointer Ken, what will be very
[
https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755956#action_12755956
]
Julien Nioche commented on NUTCH-692:
-
I've been using this patch for a while now and
Use GenericOptionsParser instead of FileSystem.parseArgs()
--
Key: NUTCH-754
URL: https://issues.apache.org/jira/browse/NUTCH-754
Project: Nutch
Issue Type: Improvement
CrawlDatum.set() does not resets Metadata if it is null
---
Key: NUTCH-756
URL: https://issues.apache.org/jira/browse/NUTCH-756
Project: Nutch
Issue Type: Bug
Reporter: Julien
[
https://issues.apache.org/jira/browse/NUTCH-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-756:
Attachment: NUTCH-756.patch
Fixes issue with metadata not being properly overridden for CrawlDatum
[
https://issues.apache.org/jira/browse/NUTCH-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-756:
Summary: CrawlDatum.set() does not reset Metadata if it is null (was:
CrawlDatum.set() does not
Avoid cloningCrawlDatum in CrawlDbReducer
--
Key: NUTCH-761
URL: https://issues.apache.org/jira/browse/NUTCH-761
Project: Nutch
Issue Type: Improvement
Reporter: Julien Nioche
[
https://issues.apache.org/jira/browse/NUTCH-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-761:
Attachment: optiCrawlReducer.patch
Avoid cloningCrawlDatum in CrawlDbReducer
[
https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-762:
Attachment: NUTCH-762-MultiGenerator.patch
Patch for the MultiGenerator
Alternative Generator
Alternative Generator which can generate several segments in one parse of the
crawlDB
-
Key: NUTCH-762
URL: https://issues.apache.org/jira/browse/NUTCH-762
Project:
[
https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-767:
Attachment: NUTCH-767.patch
Update version of Tika for the MimeType detection
Fetcher to skip queues for URLS getting repeated exceptions
-
Key: NUTCH-769
URL: https://issues.apache.org/jira/browse/NUTCH-769
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-769:
Attachment: NUTCH-769.patch
Fetcher to skip queues for URLS getting repeated exceptions
Timebomb for Fetcher
Key: NUTCH-770
URL: https://issues.apache.org/jira/browse/NUTCH-770
Project: Nutch
Issue Type: Improvement
Reporter: Julien Nioche
This patch provides the Fetcher with a timebomb
[
https://issues.apache.org/jira/browse/NUTCH-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-770:
Attachment: NUTCH-770.patch
Timebomb for Fetcher
Key:
[
https://issues.apache.org/jira/browse/NUTCH-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-769:
Attachment: NUTCH-769-2.patch
Fetcher to skip queues for URLS getting repeated exceptions
[
https://issues.apache.org/jira/browse/NUTCH-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783247#action_12783247
]
Julien Nioche commented on NUTCH-769:
-
Missed a couple of lines indeed when I was trying
[
https://issues.apache.org/jira/browse/NUTCH-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783248#action_12783248
]
Julien Nioche commented on NUTCH-770:
-
The log simply shows that the patch has not been
[
https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783612#action_12783612
]
Julien Nioche commented on NUTCH-692:
-
Ok let's leave it open for now
[
https://issues.apache.org/jira/browse/NUTCH-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-770:
Attachment: NUTCH-770-v3.patch
the v2 applied the Lucene code formatting to the whole java file
[
https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-767:
Description:
The version 5 of TIka requires a few changes to the MimeType implementation.
Tika is
[
https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-767:
Description:
The version 0.5 of TIka requires a few changes to the MimeType implementation.
Tika
[
https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-767:
Attachment: NUTCH-767-part2.patch
Fixes compilation issues for test class
[
https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche reopened NUTCH-767:
-
the problem with the test class has been investigated. am reopening the issue
so that we can mark it
[
https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-767:
Attachment: NUTCH-767-part3.patch
the problems with the test comes from the fact that tika's
[
https://issues.apache.org/jira/browse/NUTCH-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796133#action_12796133
]
Julien Nioche commented on NUTCH-658:
-
If no one objects I'll commit this one in the
[
https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796221#action_12796221
]
Julien Nioche commented on NUTCH-666:
-
I agree with Sami that this should be contributed
[
https://issues.apache.org/jira/browse/NUTCH-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-658.
-
Resolution: Fixed
Fix Version/s: 1.1
Committed revision 895972
Add Counter for # of doc
[
https://issues.apache.org/jira/browse/NUTCH-655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche reassigned NUTCH-655:
---
Assignee: Julien Nioche
Injecting Crawl metadata
[
https://issues.apache.org/jira/browse/NUTCH-719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche reassigned NUTCH-719:
---
Assignee: Julien Nioche
fetchQueues.totalSize incorrect in Fetcher2
[
https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche reassigned NUTCH-762:
---
Assignee: Julien Nioche
Alternative Generator which can generate several segments in one
[
https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche reassigned NUTCH-692:
---
Assignee: Julien Nioche
AlreadyBeingCreatedException with Hadoop 0.19
[
https://issues.apache.org/jira/browse/NUTCH-655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-655.
-
Resolution: Fixed
Committed revision 896539
Injecting Crawl metadata
[
https://issues.apache.org/jira/browse/NUTCH-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797653#action_12797653
]
Julien Nioche commented on NUTCH-776:
-
Did you notice any improvement in the fetch rate
[
https://issues.apache.org/jira/browse/NUTCH-269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche reassigned NUTCH-269:
---
Assignee: Julien Nioche
CrawlDbReducer: OOME because no upper-bound on inlinks count
[
https://issues.apache.org/jira/browse/NUTCH-269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797990#action_12797990
]
Julien Nioche commented on NUTCH-269:
-
I will shortly commit a variant of this approach
[
https://issues.apache.org/jira/browse/NUTCH-269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-269.
-
Resolution: Fixed
Fix Version/s: 1.1
Committed revision 897180
CrawlDbReducer: OOME
[
https://issues.apache.org/jira/browse/NUTCH-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-767.
---
Resolution: Fixed
Committed revision 897825
Update Tika to v0.5 for the MimeType detection
[
https://issues.apache.org/jira/browse/NUTCH-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-751.
-
Resolution: Later
The changes in the underlying API are quite substantial and this would need a
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798727#action_12798727
]
Julien Nioche commented on NUTCH-766:
-
Hi Chris,
No worries, I'd rather wait for you
Mechanism for passing metadata from parse to crawldb
Key: NUTCH-779
URL: https://issues.apache.org/jira/browse/NUTCH-779
Project: Nutch
Issue Type: New Feature
Reporter:
[
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-779:
Attachment: NUTCH-779
Mechanism for passing metadata from parse to crawldb
[
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802172#action_12802172
]
Julien Nioche commented on NUTCH-779:
-
The property needs some documentation in
[
https://issues.apache.org/jira/browse/NUTCH-778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-778.
-
Resolution: Invalid
Fix Version/s: (was: 1.0.0)
This is likely to be a problem with
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803670#action_12803670
]
Julien Nioche commented on NUTCH-766:
-
I think the end result of this plugin should be
1 - 100 of 188 matches
Mail list logo