[jira] Created: (NUTCH-794) Tika parser does not keep attributes on html tag

2010-02-16 Thread Julien Nioche (JIRA)
Tika parser does not keep attributes on html tag Key: NUTCH-794 URL: https://issues.apache.org/jira/browse/NUTCH-794 Project: Nutch Issue Type: Bug Reporter: Julien Nioche

[jira] Updated: (NUTCH-794) Tika parser does identify lang attributes on html tag

2010-02-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-794: Description: The following HTML document : html lang=fiheaddocument 1 title/headbodyjotain

[jira] Updated: (NUTCH-794) Tika parser does identify lang attributes on html tag

2010-02-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-794: Attachment: NUTCH-794.patch Tika parser does identify lang attributes on html tag

[jira] Commented: (NUTCH-794) Language Identification must use check the parse metadata for language values

2010-02-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834147#action_12834147 ] Julien Nioche commented on NUTCH-794: - Committed patch in revision 910454 Waiting for

[jira] Updated: (NUTCH-794) Language Identification must use check the parse metadata for language values

2010-02-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-794: Summary: Language Identification must use check the parse metadata for language values (was: Tika

[jira] Work started: (NUTCH-794) Language Identification must use check the parse metadata for language values

2010-02-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-794 started by Julien Nioche. Language Identification must use check the parse metadata for language values

[jira] Updated: (NUTCH-794) Language Identification must use check the parse metadata for language values

2010-02-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-794: Component/s: parser Language Identification must use check the parse metadata for language values

[jira] Updated: (NUTCH-782) Ability to order htmlparsefilters

2010-02-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-782: Component/s: parser Ability to order htmlparsefilters -

Hudson build is back to normal : Nutch-trunk #1071

2010-02-16 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/1071/changes

[jira] Commented: (NUTCH-793) search.jsp compile errors

2010-02-16 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834659#action_12834659 ] Hudson commented on NUTCH-793: -- Integrated in Nutch-trunk #1071 (See

[jira] Commented: (NUTCH-794) Language Identification must use check the parse metadata for language values

2010-02-16 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834657#action_12834657 ] Hudson commented on NUTCH-794: -- Integrated in Nutch-trunk #1071 (See

[jira] Commented: (NUTCH-766) Tika parser

2010-02-16 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834658#action_12834658 ] Hudson commented on NUTCH-766: -- Integrated in Nutch-trunk #1071 (See