Re: Architecture options for truncating large documents

Muhammed Olgun Fri, 14 Oct 2016 08:37:58 -0700

Hi Cedric,

I would choose the option 1 and create a bash or python script to
automatically reconfigure MCF for that connector. Even we can make that
script open source so everyone easily add their custom connectors.


Thanks!
Muhammed
14 Eki 2016 Cum, saat 18:02 tarihinde Cédric Ulmer <
cedric.ul...@francelabs.com> şunu yazdı:

> Hi all,
>
>
>
> We are currently looking at  the possibility to truncate large objects
> before indexing them, at the MCF level. For this, we have an architecture
> dilemma, and we are open to the wisdom of the community:
>
>
>
> *         What we want to achieve: Whenever a document is too large,
> instead
> of just dropping it completely, we want to be able to index its metada.
>
>
>
> *         How we can achieve that:
>
> Option 1. : We create transformation connector that empties the stream, and
> keep only the metadata. Pros: we don’t modify the code of MCF. Cons:
> anytime
> we install MCF somewhere, we need to manually reconfigure the transfo
> connector as there is way no way to automatically import transformation
> conenctors.
>
>
>
> Option 2. : We modify the standard behavior of the original connector (say
> the file connector). Instead of proposing the option to drop a document if
> it’s larger than size X, we modify it so that it proposes to drop its
> content but keep the metadata if larger than size X. Pros: it is in the MCF
> code once and for all, thus available whenever we install a new MCF
> somewhere. Cons: it may not be inline with the spirit of transformation
> connectors, and it requires to do it for any original connector that we are
> targeting.
>
>
>
> Can you share your thoughts on that?
>
>
>
> Regards,
>
>
>
> Cedric
>
>
>
> Président
>
> France Labs – Les experts du Search
>
> Vainqueur du challenge Internal Search de EY à
> <http://www.vivatechnologyparis.com/> Viva Technologies 2016
>
>  <http://www.francelabs.com/> www.francelabs.com
>
>

Re: Architecture options for truncating large documents

Reply via email to