I have some code half done to do this. SHould post a patch this morning. On Tue, Apr 19, 2011 at 8:40 AM, Ted Dunning <[email protected]> wrote:
> Btw... my only suggestion on this particular patch is that it would be a > bit better to have the log messages on failure back off in frequency so that > the total number of log messages isn't proportional to the number of errors. > I tend to do power of ten or 1,2,5 back-off for these things. > > > On Tue, Apr 19, 2011 at 6:25 AM, Christopher Jordan <[email protected]>wrote: > >> Just to further the point, logging is quite important. While you obviously >> will not review every log, in a production environment, you certainly will >> have monitoring scripts check them for ERROR and WARN entries. As well, if >> you do not want to see the WARN entries from a specific class, you can >> configure your logger to skip over them. >> >> On Apr 19, 2011, at 12:07 AM, Ted Dunning wrote: >> >> > I disagree. You should document that you are discarding documents. It >> is >> > reasonable to not document every lost document and good to throw an >> > exception when too many failures occur. >> > >> > It is almost inevitable with large data that some inputs are malformed. >> > These can't stop the show, but you have to know what your exception rate >> is >> > so you can detect catastrophic failures. >> > >> > On Mon, Apr 18, 2011 at 6:00 PM, Lance Norskog <[email protected]> >> wrote: >> > >> >> Please don't log it. Nobody reads logs. >> >> Right is right and wrong is wrong. Either throw an exception or ignore >> it. >> >> You can include a ratio of accepted vectors as an output. >> >> >> >> On Mon, Apr 18, 2011 at 5:52 PM, Christopher Jordan <[email protected]> >> >> wrote: >> >>> I have incorporated this requested change in a new patch that I >> attached >> >> to ticket https://issues.apache.org/jira/browse/MAHOUT-675. >> >>> >> >>> It appears that the previous patch has already been applied. Should I >> >> repull the repo, make a new ticket, and create a new patch? >> >>> >> >>> Thanks, >> >>> >> >>> Chris >> >>> >> >>> On Apr 18, 2011, at 1:54 PM, Ted Dunning wrote: >> >>> >> >>> That sounds right to me. >> >>> >> >>> It might be plausible to blow an exception if a (configurable) large >> >> percentage of all documents have to be rejected. That is a minor >> >> improvement, though. >> >>> >> >>> On Mon, Apr 18, 2011 at 10:52 AM, Christopher Jordan < >> [email protected] >> >> <mailto:[email protected]>> wrote: >> >>> I believe, at least in my situation, a better approach is for the >> >> LuceneIterator to log a warning with the idField when it encounters a >> >> problem document and move onto the next one. >> >>> >> >>> >> >>> >> >> >> >> >> >> >> >> -- >> >> Lance Norskog >> >> [email protected] >> >> >> >> >
