[jira] [Created] (LUCENE-5949) Add Accountable.getChildResources()

Robert Muir (JIRA) Mon, 15 Sep 2014 10:00:58 -0700

Robert Muir created LUCENE-5949:
-----------------------------------

             Summary: Add Accountable.getChildResources()
                 Key: LUCENE-5949
                 URL: https://issues.apache.org/jira/browse/LUCENE-5949
             Project: Lucene - Core
          Issue Type: Task
            Reporter: Robert Muir



Since Lucene 4.5, you can see how much memory lucene is using at a basic level 
by looking at SegmentReader.ramBytesUsed()

In 4.11 its already improved, you can pull the codec producers and get ram 
usage split out by postings, norms, docvalues, stored fields, term vectors, etc.

Unfortunately most toString's are fairly useless, so you don't have any insight 
further than that, even though behind the scenes its mostly just adding up 
other Accountables.

So instead if we can improve the toString's, and if an Accountable can return 
its children, we can connect all the dots and you can easily diagnose/debug 
issues and see what is going on. I know i've been frustrated with having to 
hack up tons of System.out.printlns during development to see this stuff.

So I think we should add this method to Accountable:
{code}
  /**
   * Returns nested resources of this class. 
   * The result should be a point-in-time snapshot (to avoid race conditions).
   * @see Accountables
   */
  // TODO: on java8 make this a default method returning emptyList
  Iterable<? extends Accountable> getChildResources();
{code}

We can also add a simple helper method for quick debugging 
{{Accountables.toString(Accountable)}} to print the "tree", example output for 
a lucene segment:
{noformat}
_5f(5.0.0):C8330469: 36.4 MB
|-- postings [PerFieldPostings(formats=1)]: 8 MB
    |-- format 'Lucene41_0' 
[BlockTreeTermsReader(fields=6,delegate=Lucene41PostingsReader(positions=true,payloads=false))]:
 8 MB
        |-- field 'alternatenames' 
[BlockTreeTerms(terms=3360242,postings=13779349,positions=17102250,docs=2876726)]:
 945.2 KB
            |-- term index 
[FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=23318,arcs=66497)]:
 945.1 KB
        |-- field 'asciiname' 
[BlockTreeTerms(terms=2451266,postings=16849659,positions=16891234,docs=8329981)]:
 686.1 KB
            |-- term index 
[FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=12976,arcs=44103)]:
 686 KB
        |-- field 'geonameid' 
[BlockTreeTerms(terms=8363399,postings=33321876,positions=-1,docs=8330469)]: 
1.3 MB
            |-- term index 
[FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=528,arcs=66225)]:
 1.3 MB
        |-- field 'latitude' 
[BlockTreeTerms(terms=8714542,postings=33321876,positions=-1,docs=8330469)]: 
1.7 MB
            |-- term index 
[FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=854,arcs=77300)]:
 1.7 MB
        |-- field 'longitude' 
[BlockTreeTerms(terms=11557222,postings=33321876,positions=-1,docs=8330469)]: 
2.6 MB
            |-- term index 
[FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=1577,arcs=114570)]:
 2.6 MB
        |-- field 'name' 
[BlockTreeTerms(terms=2598879,postings=16833071,positions=16874267,docs=8330325)]:
 771.5 KB
            |-- term index 
[FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=13790,arcs=46514)]:
 771.3 KB
        |-- delegate [Lucene41PostingsReader(positions=true,payloads=false)]: 
32 bytes
|-- norms [Lucene49NormsProducer(fields=3,active=3)]: 15.9 MB
    |-- field 'alternatenames' [byte array]: 7.9 MB
    |-- field 'asciiname' [table compressed 
[Packed64SingleBlock4(bitsPerValue=4,size=8330469,blocks=520655)]]: 4 MB
    |-- field 'name' [table compressed 
[Packed64SingleBlock4(bitsPerValue=4,size=8330469,blocks=520655)]]: 4 MB
|-- docvalues [PerFieldDocValues(formats=1)]: 12.1 MB
    |-- format 'Lucene410_0' [Lucene410DocValuesProducer(fields=5)]: 12.1 MB
        |-- addresses field 'alternatenames' 
[MonotonicBlockPackedReader(blocksize=16384,size=407026,avgBPV=16)]: 808.5 KB
        |-- addresses field 'asciiname' 
[MonotonicBlockPackedReader(blocksize=16384,size=330528,avgBPV=17)]: 698.6 KB
        |-- addresses field 'name' 
[MonotonicBlockPackedReader(blocksize=16384,size=335020,avgBPV=17)]: 703.7 KB
        |-- ord index field 'alternatenames' 
[MonotonicBlockPackedReader(blocksize=16384,size=8330470,avgBPV=9)]: 9.8 MB
        |-- reverse index field 'alternatenames' 
[ReverseTermsIndex(size=6360)]: 77.9 KB
            |-- term bytes [PagedBytes(blocksize=32768)]: 67.7 KB
            |-- term addresses 
[MonotonicBlockPackedReader(blocksize=16384,size=6360,avgBPV=13)]: 10.2 KB
        |-- reverse index field 'asciiname' [ReverseTermsIndex(size=5165)]: 
60.1 KB
            |-- term bytes [PagedBytes(blocksize=32768)]: 53 KB
            |-- term addresses 
[MonotonicBlockPackedReader(blocksize=16384,size=5165,avgBPV=11)]: 7 KB
        |-- reverse index field 'name' [ReverseTermsIndex(size=5235)]: 61.2 KB
            |-- term bytes [PagedBytes(blocksize=32768)]: 54.1 KB
            |-- term addresses 
[MonotonicBlockPackedReader(blocksize=16384,size=5235,avgBPV=11)]: 7.1 KB
|-- stored fields [CompressingStoredFieldsReader(mode=FAST,chunksize=16384)]: 
216.3 KB
    |-- stored field index [CompressingStoredFieldsIndexReader(blocks=65)]: 
216.3 KB
        |-- doc base deltas: 55.8 KB
        |-- start pointer deltas: 158.9 KB
|-- term vectors [CompressingTermVectorsReader(mode=FAST,chunksize=4096)]: 224 
KB
    |-- term vector index [CompressingStoredFieldsIndexReader(blocks=67)]: 224 
KB
        |-- doc base deltas: 65.6 KB
        |-- start pointer deltas: 156.8 KB
{noformat}

Note this works for any accountable, so also e.g. NRTCachingDirectory, 
OrdinalMap, Suggesters, FSTs, and so on. You can also e.g. traverse the graph 
yourself and output whatever you want.

To be safe, I define that the graph returned is "point in time snapshot" and 
free of race conditions, and the Accountable helper methods provide this and 
also prevent access (even via cast) to datastructures you shouldn't be able to 
get to, just provide information.

Since we aren't on java 8 yet (and cannot provide a simple default method), 
instead I think we should just add the method to Accountable, but add default 
emptyList() implementations to impacted datastructures such as DocIDSet and 
Suggester. For codec APIs, these are lower level, and there I think its best to 
leave the method abstract since they should really be providing useful 
information.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (LUCENE-5949) Add Accountable.getChildResources()

Reply via email to