I have some code half done to do this.  SHould post a patch this morning.

On Tue, Apr 19, 2011 at 8:40 AM, Ted Dunning <[email protected]> wrote:

> Btw... my only suggestion on this particular patch is that it would be a
> bit better to have the log messages on failure back off in frequency so that
> the total number of log messages isn't proportional to the number of errors.
>  I tend to do power of ten or 1,2,5 back-off for these things.
>
>
> On Tue, Apr 19, 2011 at 6:25 AM, Christopher Jordan <[email protected]>wrote:
>
>> Just to further the point, logging is quite important. While you obviously
>> will not review every log, in a production environment, you certainly will
>> have monitoring scripts check them for ERROR and WARN entries. As well, if
>> you do not want to see the WARN entries from a specific class, you can
>> configure your logger to skip over them.
>>
>> On Apr 19, 2011, at 12:07 AM, Ted Dunning wrote:
>>
>> > I disagree.  You should document that you are discarding documents.  It
>> is
>> > reasonable to not document every lost document and good to throw an
>> > exception when too many failures occur.
>> >
>> > It is almost inevitable with large data that some inputs are malformed.
>> > These can't stop the show, but you have to know what your exception rate
>> is
>> > so you can detect catastrophic failures.
>> >
>> > On Mon, Apr 18, 2011 at 6:00 PM, Lance Norskog <[email protected]>
>> wrote:
>> >
>> >> Please don't log it. Nobody reads logs.
>> >> Right is right and wrong is wrong. Either throw an exception or ignore
>> it.
>> >> You can include a ratio of accepted vectors as an output.
>> >>
>> >> On Mon, Apr 18, 2011 at 5:52 PM, Christopher Jordan <[email protected]>
>> >> wrote:
>> >>> I have incorporated this requested change in a new patch that I
>> attached
>> >> to ticket https://issues.apache.org/jira/browse/MAHOUT-675.
>> >>>
>> >>> It appears that the previous patch has already been applied. Should I
>> >> repull the repo, make a new ticket, and create a new patch?
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Chris
>> >>>
>> >>> On Apr 18, 2011, at 1:54 PM, Ted Dunning wrote:
>> >>>
>> >>> That sounds right to me.
>> >>>
>> >>> It might be plausible to blow an exception if a (configurable) large
>> >> percentage of all documents have to be rejected.  That is a minor
>> >> improvement, though.
>> >>>
>> >>> On Mon, Apr 18, 2011 at 10:52 AM, Christopher Jordan <
>> [email protected]
>> >> <mailto:[email protected]>> wrote:
>> >>> I believe, at least in my situation, a better approach is for the
>> >> LuceneIterator to log a warning with the idField when it encounters a
>> >> problem document and move onto the next one.
>> >>>
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Lance Norskog
>> >> [email protected]
>> >>
>>
>>
>

Reply via email to