Re: Java logging in Lucene

Shai Erera Fri, 05 Dec 2008 11:38:51 -0800

Have you ever tried to debug your search application after it was shipped to
a customer? When problems occur on the customer end, you cannot very easily
reproduce problems because customers don't like to give you access to their
systems, not always they are willing to share the index with you and let
alone the documents that have been indexed.

Logging is very common in products just for that purpose. Of course I can
use debugging when something happens in my development environment. But
that's not the case after the product has shipped.

As for the logging framework, I'd think that Java logging creates no
dependencies for Lucene. java.util.logging exists at least since 1.4. So
it's already in the JDK. You might argue that some applications who embed a
search component over Lucene use a different logging system (such as Log4j),
but in that case I think it'd be fair to say that Java logging is what
Lucene uses.

You already do it today - you say that you use infoStream which prints
messages. Only the solution in Lucene today cannot be customized. I either
turn on *logging* for the entire Lucene package (or actually just the
indexing part) or not. I cannot, for example, turn on *logging* just for the
merge part.

The debugging on the customer side is mostly what I'm after. My experience
with another search library (proprietary) with exactly the same *logging*
capabilities like Lucene (you either turn on/off logging for everything),
although it contained messages from other parts of the search library as
well, show that it's extremely difficult to debug what's going on during
search on the customer side. Sometimes, all the application can log is that
it adds a document with some attributes, but if you really want to
understand what's going on inside Lucene, it's impossible. One useful
information might be what are the actual tokens that were added to the
index. There's no way the application can tell you that, w/o running the
Analyzer on the text. But then it needs to write code, which I think could
have been written in Lucene.
Another useful information is what is the query that's actually being run. I
guess that printing the QueryParser Query output object might be enough, but
you never know.
Maybe you'd like to know what indexes participated in the search, in case of
a distributed indexing scenario.

And the list can only grow ...

Like I said in my first email - logging is an approach the community has to
make, w/o neccessarily going over all the existing code and add messages.
Those can be added over time, by many people who'd like to get detailed
information from Lucene.

I hope my intentions are clearer now.

On Fri, Dec 5, 2008 at 9:06 PM, Michael McCandless <
[EMAIL PROTECTED]> wrote:

>
> I also feel that the primary usage of the internal messaging in Lucene
> today is debugging, and we don't need a logging framework for that.
>
> Mike
>
>
> Doug Cutting wrote:
>
>  The infoStream stuff goes back to 1997, before there was log4j or any
>> other Java logging framework.
>>
>> There's never been a big push to add logging to Lucene.  It would add a
>> dependency, and Lucene's jar has always been standalone, which is nice.
>>  Dependencies can conflict.  If Lucene requires one version of a dependency,
>> then it may not work well with code that require a different version of that
>> dependency.
>>
>> And it hasn't been clear which framework to adopt.  Log4j is the
>> granddaddy, then there's Java logging and commons logging.  Today the
>> preferred framework is probably SLF4J.  Good thing we didn't choose the
>> wrong one years ago!
>>
>> And how many log entries would folks really want to see per query or
>> document indexed?  In production I don't think most folks want to see more
>> than one entry per query or document indexed.  So finer-grained logging
>> would be for debugging.  For that one can instead use a debugger.  Hence the
>> traditional lack of demand for detailed logging in Lucene.
>>
>> That's the history as I recall it.  The future is less clear.
>>
>> Doug
>>
>> Grant Ingersoll wrote:
>>
>>> I think the main motivation has always been to have no dependencies in
>>> the core so as to keep it as fast and lightweight as possible.  Then, of
>>> course, there is always the usual religious wars around which logging
>>> framework to use, not to mention the nightmare that is trying to manage
>>> multiple logging frameworks across several projects that are being
>>> integrated.  Then, of course, there is the question of how useful any core
>>> Lucene logs would be to users writing search applications.  For the most
>>> part, my experience has been that I want logging to tell me when a document
>>> was added, when searches occur, etc. but I don't necessarily need to know
>>> things like the fact that Lucene is now entering the analysis phase of
>>> Document inversion.  And, for all these needs, I can just as well do that
>>> logging in the application and not in Lucene.
>>> All that is not to say we couldn't add in logging, I'm just suggesting
>>> reasons I can think of for why it has not been added to date and why I am
>>> not sure it needs to be there going forward.  I believe various other people
>>> have contributed reasons in the past.  I seem to recall Doug spelling some
>>> out, but don't have the thread handy.
>>> -Grant
>>> On Dec 5, 2008, at 1:17 PM, Shai Erera wrote:
>>>
>>>> Hi
>>>>
>>>> I was wondering why doesn't the Lucene code uses Java logging, instead
>>>> of the infoStream set in IndexWriter? Today, if I want to enable tracing of
>>>> Lucene code, the only thing I can do is set an infoStream, but then I get
>>>> many many messages. Moreoever, those messages seem to cover indexing code
>>>> only.
>>>>
>>>> I hope to get some opinions on the use of Java logging instead of
>>>> infoStream, and hopefully to start addind logging messages in other places
>>>> in the code (like during search, query parsing etc.)
>>>>
>>>> I feel that this is an approach the community has to decide on before we
>>>> start adding messages to the code. Using Java logging can greatly benefit
>>>> tracing of indexing applications who use Lucene. If the vote is +1 for 
>>>> using
>>>> Java logging, we can start by deprecating infoStream (in 2.9, remove in 
>>>> 3.0)
>>>> and use logging instead.
>>>>
>>>> What do you think?
>>>>
>>>> Shai
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Re: Java logging in Lucene

Reply via email to