[
https://issues.apache.org/jira/browse/NUTCH-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche reassigned NUTCH-817:
---
Assignee: Julien Nioche
> parse-(html)does follow links of full html page, parse-(tika) does f
[
https://issues.apache.org/jira/browse/NUTCH-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859286#action_12859286
]
Julien Nioche commented on NUTCH-710:
-
As suggested previously we could either treat can
[
https://issues.apache.org/jira/browse/NUTCH-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856349#action_12856349
]
Julien Nioche commented on NUTCH-808:
-
Hi Enis,
{quote}
On the other hand, current impl
[
https://issues.apache.org/jira/browse/NUTCH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-650:
Affects Version/s: (was: 1.0.0)
Fix Version/s: 2.0
> Hbase Integration
> ---
[
https://issues.apache.org/jira/browse/NUTCH-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-808:
Fix Version/s: 2.0
> Evaluate ORM Frameworks which support non-relational column-oriented
> datasto
[
https://issues.apache.org/jira/browse/NUTCH-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-810.
---
Resolution: Fixed
Committed in rev 931098.
http://issues.apache.org/jira/browse/TIKA-317 changed the
[
https://issues.apache.org/jira/browse/NUTCH-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-789:
Component/s: (was: fetcher)
parser
Fix Version/s: (was: 1.1)
Have c
Upgrade to Tika 0.7
---
Key: NUTCH-810
URL: https://issues.apache.org/jira/browse/NUTCH-810
Project: Nutch
Issue Type: Improvement
Components: parser
Affects Versions: 1.0.0
Reporter: Julien Nioche
[
https://issues.apache.org/jira/browse/NUTCH-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-809:
Description:
h2. Parse-metatags plugin
The parse-metatags plugin consists of a HTMLParserFilter whi
[
https://issues.apache.org/jira/browse/NUTCH-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853251#action_12853251
]
Julien Nioche commented on NUTCH-789:
-
Will upgrade as soon as 0.7 is available from
ht
[
https://issues.apache.org/jira/browse/NUTCH-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-809:
Description:
h2. Parse-metatags plugin
The parse-metatags plugin consists of a HTMLParserFilter whi
[
https://issues.apache.org/jira/browse/NUTCH-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-809:
Attachment: NUTCH-809.patch
Modified version of the plugin which is compatible with parse-tika
> Pa
[
https://issues.apache.org/jira/browse/NUTCH-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-809:
Attachment: (was: NUTCH-809.patch)
> Parse-metatags plugin
> -
>
>
[
https://issues.apache.org/jira/browse/NUTCH-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-809:
Attachment: NUTCH-809.patch
> Parse-metatags plugin
> -
>
> Key:
Parse-metatags plugin
-
Key: NUTCH-809
URL: https://issues.apache.org/jira/browse/NUTCH-809
Project: Nutch
Issue Type: New Feature
Components: parser
Reporter: Julien Nioche
Assignee: Jul
[
https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852095#action_12852095
]
Julien Nioche commented on NUTCH-794:
-
The issue has not been fixed in Tika. Will refile
[
https://issues.apache.org/jira/browse/NUTCH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-706:
Fix Version/s: (was: 1.1)
Both variants of the substitution rule above break existing tests. Mor
[
https://issues.apache.org/jira/browse/NUTCH-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851545#action_12851545
]
Julien Nioche commented on NUTCH-570:
-
{quote}Julien, want to take this?{quote}
Not par
[
https://issues.apache.org/jira/browse/NUTCH-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851316#action_12851316
]
Julien Nioche commented on NUTCH-789:
-
Shall we postpone the work on this issue to after
[
https://issues.apache.org/jira/browse/NUTCH-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-714:
Affects Version/s: (was: 0.9.0)
1.0.0
Fix Version/s: (was: 0.8
[
https://issues.apache.org/jira/browse/NUTCH-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-785.
---
Resolution: Fixed
Committed revision 929039
Thanks Andrzej for reviewing it
> Fetcher : copy metadat
[
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-779.
-
Resolution: Fixed
Fix Version/s: 1.1
Committed revision 929038.
Thanks Andrzej for your fe
[
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850915#action_12850915
]
Julien Nioche commented on NUTCH-779:
-
Could anyone please review this issue? I would li
[
https://issues.apache.org/jira/browse/NUTCH-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850912#action_12850912
]
Julien Nioche commented on NUTCH-785:
-
Could anyone please review this issue? I would li
[
https://issues.apache.org/jira/browse/NUTCH-783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-783:
Fix Version/s: (was: 1.1)
Removed tag 1.1
Will rename to IndexingPluginsChecker later
> Indexer
Merge CrawlDBScanner with CrawlDBReader
---
Key: NUTCH-806
URL: https://issues.apache.org/jira/browse/NUTCH-806
Project: Nutch
Issue Type: Improvement
Reporter: Julien Nioche
Assign
[
https://issues.apache.org/jira/browse/NUTCH-784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-784:
Fix Version/s: 1.1
> CrawlDBScanner
> ---
>
> Key: NUTCH-784
>
[
https://issues.apache.org/jira/browse/NUTCH-784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-784.
---
Resolution: Fixed
Committed revision 928746
> CrawlDBScanner
> ---
>
> K
[
https://issues.apache.org/jira/browse/NUTCH-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-776:
Fix Version/s: (was: 1.1)
Moving this issue post 1.1
Needs a patch file, some description of the
[
https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-762.
---
Resolution: Fixed
Committed revision 926155
Have reverted the prefix for params to 'generate.' + adde
[
https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848140#action_12848140
]
Julien Nioche commented on NUTCH-762:
-
The change of prefix also reflected that we now u
[
https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848095#action_12848095
]
Julien Nioche commented on NUTCH-762:
-
{quote}
I just noticed that the new Generator use
[
https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-762:
Attachment: NUTCH-762-v3.patch
new patch which reintroduces the 'generator.update.crawldb' functiona
[
https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-762:
Fix Version/s: 1.1
> Alternative Generator which can generate several segments in one parse of the
[
https://issues.apache.org/jira/browse/NUTCH-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-740.
---
Resolution: Fixed
Assignee: Julien Nioche
Committed in rev 926003
Thanks Marcin for contributing
[
https://issues.apache.org/jira/browse/NUTCH-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-740:
Attachment: NUTCH-740.patch
Slightly modified version of the patch with modifs for protocol-http.
wi
[
https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846930#action_12846930
]
Julien Nioche commented on NUTCH-762:
-
Yes, I came across that situation too on a large
[
https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846910#action_12846910
]
Julien Nioche commented on NUTCH-762:
-
OK, there was indeed an assumption that the gener
[
https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846141#action_12846141
]
Julien Nioche commented on NUTCH-762:
-
If I am not mistaken the point of having _genera
[
https://issues.apache.org/jira/browse/NUTCH-740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845886#action_12845886
]
Julien Nioche commented on NUTCH-740:
-
A nice contribution but should not this be applie
[
https://issues.apache.org/jira/browse/NUTCH-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-469:
Fix Version/s: (was: 1.1)
There has not been any changes to this issue since February 09 and it
[
https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-692.
-
Resolution: Cannot Reproduce
Fix Version/s: 1.1
I cannot reproduce the issue since we moved
[
https://issues.apache.org/jira/browse/NUTCH-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-710:
Fix Version/s: (was: 1.1)
Great idea. Won't be included in 1.1 though so moving to *fix : unknow
[
https://issues.apache.org/jira/browse/NUTCH-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-801.
-
Resolution: Fixed
Committed revision 921840.
> Remove RTF and MP3 parse plugins
> --
[
https://issues.apache.org/jira/browse/NUTCH-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-798.
-
Resolution: Fixed
Updated SOLRJ's dependencies at the same time :
Deleting lib/apache-solr
Remove RTF and MP3 parse plugins
Key: NUTCH-801
URL: https://issues.apache.org/jira/browse/NUTCH-801
Project: Nutch
Issue Type: Improvement
Components: parser
Affects Versions: 1.0.0
[
https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-762:
Attachment: NUTCH-762-v2.patch
Improved version of the patch :
- fixed a few minor bugs
- renamed
[
https://issues.apache.org/jira/browse/NUTCH-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-762:
Attachment: (was: NUTCH-762-MultiGenerator.patch)
> Alternative Generator which can generate sev
[
https://issues.apache.org/jira/browse/NUTCH-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-799.
---
Resolution: Fixed
Assignee: Julien Nioche
Thanks for your feedback Andrzej
Committed revision 9
[
https://issues.apache.org/jira/browse/NUTCH-782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-782.
---
Resolution: Fixed
Committed revision 917557
> Ability to order htmlparsefilters
> ---
[
https://issues.apache.org/jira/browse/NUTCH-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-799:
Attachment: NUTCH-799.patch
> SOLRIndexer to commit once all reducers have finished
> --
SOLRIndexer to commit once all reducers have finished
-
Key: NUTCH-799
URL: https://issues.apache.org/jira/browse/NUTCH-799
Project: Nutch
Issue Type: Improvement
Components: inde
Upgrade to SOLR1.4
--
Key: NUTCH-798
URL: https://issues.apache.org/jira/browse/NUTCH-798
Project: Nutch
Issue Type: Improvement
Components: indexer
Reporter: Julien Nioche
Fix For: 1.1
in
[
https://issues.apache.org/jira/browse/NUTCH-719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837147#action_12837147
]
Julien Nioche commented on NUTCH-719:
-
the other addFetchItem method of FetchItemQueues
[
https://issues.apache.org/jira/browse/NUTCH-719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-719.
---
> fetchQueues.totalSize incorrect in Fetcher2
> ---
>
>
[
https://issues.apache.org/jira/browse/NUTCH-719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-719.
-
Resolution: Fixed
Fix Version/s: 1.1
Committed revision 911905.
Thanks to S. Dennis for inv
[
https://issues.apache.org/jira/browse/NUTCH-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-644.
-
Resolution: Fixed
RTF parsing is now handled by the TikaPlugin (NUTCH-766) which solves the issue
[
https://issues.apache.org/jira/browse/NUTCH-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-705.
-
Resolution: Fixed
RTF parsing is now handled by the TikaPlugin (NUTCH-766). Please open an issue
[
https://issues.apache.org/jira/browse/NUTCH-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-750:
Component/s: parser
> HtmlParser plugin - page title extraction
> --
[
https://issues.apache.org/jira/browse/NUTCH-782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-782:
Component/s: parser
> Ability to order htmlparsefilters
> -
>
>
[
https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-794:
Component/s: parser
> Language Identification must use check the parse metadata for language values
[
https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on NUTCH-794 started by Julien Nioche.
> Language Identification must use check the parse metadata for language values
>
[
https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-794:
Summary: Language Identification must use check the parse metadata for
language values (was: Tika
[
https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834147#action_12834147
]
Julien Nioche commented on NUTCH-794:
-
Committed patch in revision 910454
Waiting for i
[
https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-794:
Attachment: NUTCH-794.patch
> Tika parser does identify lang attributes on html tag
> --
[
https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834143#action_12834143
]
Julien Nioche commented on NUTCH-794:
-
Apart from the html attribute being lost (see abo
[
https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-794:
Description:
The following HTML document :
document 1 titlejotain suomeksi
is rendered as the fol
Tika parser does not keep attributes on html tag
Key: NUTCH-794
URL: https://issues.apache.org/jira/browse/NUTCH-794
Project: Nutch
Issue Type: Bug
Reporter: Julien Nioche
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-766.
---
Have added small improvement in revision 910187 (Prioritise default Tika parser
when discovering plugins
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832583#action_12832583
]
Julien Nioche commented on NUTCH-766:
-
@Chris : did you do
ant -f src/plugin/parse-tik
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832564#action_12832564
]
Julien Nioche edited comment on NUTCH-766 at 2/11/10 5:22 PM:
--
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832564#action_12832564
]
Julien Nioche commented on NUTCH-766:
-
I had a closer look at the HTML parsing issue. Wh
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832454#action_12832454
]
Julien Nioche commented on NUTCH-766:
-
@Chris : I just did a fresh co from svn, applied
[
https://issues.apache.org/jira/browse/NUTCH-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-787:
Fix Version/s: 1.1
> Upgrade Lucene to 3.0.0.
>
>
> Key: NU
[
https://issues.apache.org/jira/browse/NUTCH-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-786.
---
Resolution: Fixed
Committed revision 906907
> Better list of suffix domains
> ---
[
https://issues.apache.org/jira/browse/NUTCH-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-786:
Attachment: NUTCH-786.patch
Small improvement to the content of domain-suffixes.xml : added compound
Better list of suffix domains
-
Key: NUTCH-786
URL: https://issues.apache.org/jira/browse/NUTCH-786
Project: Nutch
Issue Type: Improvement
Affects Versions: 1.0.0
Reporter: Julien Nioche
[
https://issues.apache.org/jira/browse/NUTCH-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828548#action_12828548
]
Julien Nioche commented on NUTCH-781:
-
> did you forgot to update conf/tika-mimetypes.xm
[
https://issues.apache.org/jira/browse/NUTCH-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-785:
Attachment: NUTCH-785.patch
> Fetcher : copy metadata from origin URL when redirecting + call
> scf
Fetcher : copy metadata from origin URL when redirecting + call
scfilters.initialScore on newly created URL
---
Key: NUTCH-785
URL: https://issues.apache.org/j
[
https://issues.apache.org/jira/browse/NUTCH-784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-784:
Attachment: NUTCH-784.patch
> CrawlDBScanner
> ---
>
> Key: NUTCH-784
>
CrawlDBScanner
---
Key: NUTCH-784
URL: https://issues.apache.org/jira/browse/NUTCH-784
Project: Nutch
Issue Type: New Feature
Reporter: Julien Nioche
Assignee: Julien Nioche
Attachments: NUTCH-78
[
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-779:
Attachment: NUTCH-779-v2.patch
Improved version of the patch. Followed AB's recommendations and rena
[
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche reassigned NUTCH-779:
---
Assignee: Julien Nioche
> Mechanism for passing metadata from parse to crawldb
> -
[
https://issues.apache.org/jira/browse/NUTCH-783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-783:
Attachment: NUTCH-783.patch
> IndexerChecker Utilty
> -
>
> Key:
[
https://issues.apache.org/jira/browse/NUTCH-783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche reassigned NUTCH-783:
---
Assignee: Julien Nioche
> IndexerChecker Utilty
> -
>
> Ke
IndexerChecker Utilty
-
Key: NUTCH-783
URL: https://issues.apache.org/jira/browse/NUTCH-783
Project: Nutch
Issue Type: New Feature
Components: indexer
Reporter: Julien Nioche
Fix For: 1.
[
https://issues.apache.org/jira/browse/NUTCH-782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-782:
Attachment: NUTCH-782.patch
> Ability to order htmlparsefilters
> -
Ability to order htmlparsefilters
-
Key: NUTCH-782
URL: https://issues.apache.org/jira/browse/NUTCH-782
Project: Nutch
Issue Type: New Feature
Reporter: Julien Nioche
Assignee: Julien N
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-766:
Attachment: (was: Nutch-766.ParserFactory.patch)
> Tika parser
> ---
>
>
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-766:
Attachment: (was: NUTCH-766.tika.patch)
> Tika parser
> ---
>
> Key: NUT
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-766:
Attachment: NUTCH-766-v3.patch
Updated version of the plugin : uses Tika 0.6
> Tika parser
> --
[
https://issues.apache.org/jira/browse/NUTCH-781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-781.
---
> Update Tika to v0.6 for the MimeType detection
> ---
>
>
[
https://issues.apache.org/jira/browse/NUTCH-781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-781.
-
Resolution: Fixed
Committed revision 905228
> Update Tika to v0.6 for the MimeType detection
> -
Update Tika to v0.6 for the MimeType detection
---
Key: NUTCH-781
URL: https://issues.apache.org/jira/browse/NUTCH-781
Project: Nutch
Issue Type: Improvement
Reporter: Julien Nioche
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-766:
Attachment: NUTCH-766.v2
sample.tar.gz
new version of the patch + archive containing
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805892#action_12805892
]
Julien Nioche commented on NUTCH-766:
-
Here is a slightly better version of the patch wh
[
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803670#action_12803670
]
Julien Nioche commented on NUTCH-766:
-
> I think the end result of this plugin should be
[
https://issues.apache.org/jira/browse/NUTCH-778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-778.
-
Resolution: Invalid
Fix Version/s: (was: 1.0.0)
This is likely to be a problem with the
[
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802172#action_12802172
]
Julien Nioche commented on NUTCH-779:
-
> The property needs some documentation in nutch-
1 - 100 of 221 matches
Mail list logo