[
https://issues.apache.org/jira/browse/SOLR-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212885#comment-15212885
]
Lewis John McGibbney edited comment on SOLR-8714 at 3/26/16 8:36 AM:
---------------------------------------------------------------------
Hi [~teofili] I started a patch which I thought was sound. The blocker right
now is SOLR-8716
If we can do the upgrade on Tika then this issue (with Joshua for example
backing statistical machine translation via the [language
packs|http://joshua-decoder.org/language-packs/] we've been generating) then
this issue is IMHO a game changer for the way that Web crawlers harvest and
make data available, useful and ultimately meaningful to us all. If we can get
Solr doing statistical machine translation at indexing time then this is a game
changer (of course others are doing it, but for the open source Apache Solr it
would be excellent).
was (Author: lewismc):
Hi [~teofili] I started a patch which I thought was sound. The blocker right
now is SOLR-8716
If we can do the upgrade on Tika then this issue (with Joshua for example
backing statistical machine translation via the language packs we've been
generating) then this issue is IMHO a game changer for the way that Web
crawlers harvest and make data available, useful and ultimately meaningful to
us all. If we can get Solr doing statistical machine translation at indexing
time then this is a game changer (of course others are doing it, but for the
open source Apache Solr it would be excellent).
> Implement translation contrib package for LanguageTranslationUpdateProcessor's
> ------------------------------------------------------------------------------
>
> Key: SOLR-8714
> URL: https://issues.apache.org/jira/browse/SOLR-8714
> Project: Solr
> Issue Type: New Feature
> Reporter: Lewis John McGibbney
> Fix For: master
>
>
> A while back over in Tika we implemented the
> [Translator|https://github.com/apache/tika/blob/master/tika-core/src/main/java/org/apache/tika/language/translate/Translator.java]
> interface. This now provides a number of
> [implementations|https://github.com/apache/tika/tree/master/tika-translate/src/main/java/org/apache/tika/language/translate].
>
> This issue will provide a translation contrib package offering a
> LanguageTranslationUpdateProcessor.
> The new processor will probably utilize the existing [Solr Language
> Identifier|https://github.com/apache/lucene-solr/tree/master/solr/contrib/langid]
> and would enable a document to be translated based upon a user defined
> mapping. The LanguageTranslatorUpdateProcessor's should be pluggable and
> would be placed in an UpdateChain the same as the
> [LanguageIdentifierUpdateProcessor|https://github.com/apache/lucene-solr/blob/master/solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java]'s
> It is my intent to also provide a wiki page which can be referenced and
> maintained in conjunction with the code.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]