Robert Muir created LUCENE-5949:
-----------------------------------
Summary: Add Accountable.getChildResources()
Key: LUCENE-5949
URL: https://issues.apache.org/jira/browse/LUCENE-5949
Project: Lucene - Core
Issue Type: Task
Reporter: Robert Muir
Since Lucene 4.5, you can see how much memory lucene is using at a basic level
by looking at SegmentReader.ramBytesUsed()
In 4.11 its already improved, you can pull the codec producers and get ram
usage split out by postings, norms, docvalues, stored fields, term vectors, etc.
Unfortunately most toString's are fairly useless, so you don't have any insight
further than that, even though behind the scenes its mostly just adding up
other Accountables.
So instead if we can improve the toString's, and if an Accountable can return
its children, we can connect all the dots and you can easily diagnose/debug
issues and see what is going on. I know i've been frustrated with having to
hack up tons of System.out.printlns during development to see this stuff.
So I think we should add this method to Accountable:
{code}
/**
* Returns nested resources of this class.
* The result should be a point-in-time snapshot (to avoid race conditions).
* @see Accountables
*/
// TODO: on java8 make this a default method returning emptyList
Iterable<? extends Accountable> getChildResources();
{code}
We can also add a simple helper method for quick debugging
{{Accountables.toString(Accountable)}} to print the "tree", example output for
a lucene segment:
{noformat}
_5f(5.0.0):C8330469: 36.4 MB
|-- postings [PerFieldPostings(formats=1)]: 8 MB
|-- format 'Lucene41_0'
[BlockTreeTermsReader(fields=6,delegate=Lucene41PostingsReader(positions=true,payloads=false))]:
8 MB
|-- field 'alternatenames'
[BlockTreeTerms(terms=3360242,postings=13779349,positions=17102250,docs=2876726)]:
945.2 KB
|-- term index
[FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=23318,arcs=66497)]:
945.1 KB
|-- field 'asciiname'
[BlockTreeTerms(terms=2451266,postings=16849659,positions=16891234,docs=8329981)]:
686.1 KB
|-- term index
[FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=12976,arcs=44103)]:
686 KB
|-- field 'geonameid'
[BlockTreeTerms(terms=8363399,postings=33321876,positions=-1,docs=8330469)]:
1.3 MB
|-- term index
[FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=528,arcs=66225)]:
1.3 MB
|-- field 'latitude'
[BlockTreeTerms(terms=8714542,postings=33321876,positions=-1,docs=8330469)]:
1.7 MB
|-- term index
[FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=854,arcs=77300)]:
1.7 MB
|-- field 'longitude'
[BlockTreeTerms(terms=11557222,postings=33321876,positions=-1,docs=8330469)]:
2.6 MB
|-- term index
[FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=1577,arcs=114570)]:
2.6 MB
|-- field 'name'
[BlockTreeTerms(terms=2598879,postings=16833071,positions=16874267,docs=8330325)]:
771.5 KB
|-- term index
[FST(input=BYTE1,output=ByteSequenceOutputs,packed=false,nodes=13790,arcs=46514)]:
771.3 KB
|-- delegate [Lucene41PostingsReader(positions=true,payloads=false)]:
32 bytes
|-- norms [Lucene49NormsProducer(fields=3,active=3)]: 15.9 MB
|-- field 'alternatenames' [byte array]: 7.9 MB
|-- field 'asciiname' [table compressed
[Packed64SingleBlock4(bitsPerValue=4,size=8330469,blocks=520655)]]: 4 MB
|-- field 'name' [table compressed
[Packed64SingleBlock4(bitsPerValue=4,size=8330469,blocks=520655)]]: 4 MB
|-- docvalues [PerFieldDocValues(formats=1)]: 12.1 MB
|-- format 'Lucene410_0' [Lucene410DocValuesProducer(fields=5)]: 12.1 MB
|-- addresses field 'alternatenames'
[MonotonicBlockPackedReader(blocksize=16384,size=407026,avgBPV=16)]: 808.5 KB
|-- addresses field 'asciiname'
[MonotonicBlockPackedReader(blocksize=16384,size=330528,avgBPV=17)]: 698.6 KB
|-- addresses field 'name'
[MonotonicBlockPackedReader(blocksize=16384,size=335020,avgBPV=17)]: 703.7 KB
|-- ord index field 'alternatenames'
[MonotonicBlockPackedReader(blocksize=16384,size=8330470,avgBPV=9)]: 9.8 MB
|-- reverse index field 'alternatenames'
[ReverseTermsIndex(size=6360)]: 77.9 KB
|-- term bytes [PagedBytes(blocksize=32768)]: 67.7 KB
|-- term addresses
[MonotonicBlockPackedReader(blocksize=16384,size=6360,avgBPV=13)]: 10.2 KB
|-- reverse index field 'asciiname' [ReverseTermsIndex(size=5165)]:
60.1 KB
|-- term bytes [PagedBytes(blocksize=32768)]: 53 KB
|-- term addresses
[MonotonicBlockPackedReader(blocksize=16384,size=5165,avgBPV=11)]: 7 KB
|-- reverse index field 'name' [ReverseTermsIndex(size=5235)]: 61.2 KB
|-- term bytes [PagedBytes(blocksize=32768)]: 54.1 KB
|-- term addresses
[MonotonicBlockPackedReader(blocksize=16384,size=5235,avgBPV=11)]: 7.1 KB
|-- stored fields [CompressingStoredFieldsReader(mode=FAST,chunksize=16384)]:
216.3 KB
|-- stored field index [CompressingStoredFieldsIndexReader(blocks=65)]:
216.3 KB
|-- doc base deltas: 55.8 KB
|-- start pointer deltas: 158.9 KB
|-- term vectors [CompressingTermVectorsReader(mode=FAST,chunksize=4096)]: 224
KB
|-- term vector index [CompressingStoredFieldsIndexReader(blocks=67)]: 224
KB
|-- doc base deltas: 65.6 KB
|-- start pointer deltas: 156.8 KB
{noformat}
Note this works for any accountable, so also e.g. NRTCachingDirectory,
OrdinalMap, Suggesters, FSTs, and so on. You can also e.g. traverse the graph
yourself and output whatever you want.
To be safe, I define that the graph returned is "point in time snapshot" and
free of race conditions, and the Accountable helper methods provide this and
also prevent access (even via cast) to datastructures you shouldn't be able to
get to, just provide information.
Since we aren't on java 8 yet (and cannot provide a simple default method),
instead I think we should just add the method to Accountable, but add default
emptyList() implementations to impacted datastructures such as DocIDSet and
Suggester. For codec APIs, these are lower level, and there I think its best to
leave the method abstract since they should really be providing useful
information.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]