[
https://issues.apache.org/jira/browse/LUCENE-10035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
wuda updated LUCENE-10035:
--------------------------
Comment: was deleted
(was: Thanks! I have commited and add a new pull request.)
> Simple text codec add multi level skip list data
> --------------------------------------------------
>
> Key: LUCENE-10035
> URL: https://issues.apache.org/jira/browse/LUCENE-10035
> Project: Lucene - Core
> Issue Type: Wish
> Components: core/codecs
> Affects Versions: main (9.0)
> Reporter: wuda
> Priority: Major
> Labels: Impact, MultiLevelSkipList, SimpleTextCodec
> Attachments: LUCENE-10035.patch
>
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> Simple text codec add skip list data( include impact) to help understand
> index format,For debugging, curiosity, transparency only!! When term's
> docFreq greater than or equal to SimpleTextSkipWriter.BLOCK_SIZE (default
> value is 8), Simple text codec will write skip list, the *.pst (simple text
> term dictionary file)* file will looks like this
> {code:java}
> field title
> term args
> doc 2
> freq 2
> pos 7
> pos 10
> ## we omit docs for better view ......
> doc 98
> freq 2
> pos 2
> pos 6
> skipList
> ?
> level 1
> skipDoc 65
> skipDocFP 949
> impacts
> impact
> freq 1
> norm 2
> impact
> freq 2
> norm 12
> impact
> freq 3
> norm 13
> impacts_end
> ?
> level 0
> skipDoc 17
> skipDocFP 284
> impacts
> impact
> freq 1
> norm 2
> impact
> freq 2
> norm 12
> impacts_end
> skipDoc 34
> skipDocFP 624
> impacts
> impact
> freq 1
> norm 2
> impact
> freq 2
> norm 12
> impact
> freq 3
> norm 14
> impacts_end
> skipDoc 65
> skipDocFP 949
> impacts
> impact
> freq 1
> norm 2
> impact
> freq 2
> norm 12
> impact
> freq 3
> norm 13
> impacts_end
> skipDoc 90
> skipDocFP 1311
> impacts
> impact
> freq 1
> norm 2
> impact
> freq 2
> norm 10
> impact
> freq 3
> norm 13
> impact
> freq 4
> norm 14
> impacts_end
> END
> checksum 00000000000829315543
> {code}
> compare with previous,we add *skipList,level, skipDoc, skipDocFP, impacts,
> impact, freq, norm* nodes, at the same, simple text codec can support
> advanceShallow when search time.
>
> h2. Why there has question mark symbol in the file ?
> Because the *MultiLevelSkipListWriter* will write "length" and "childPointer"
> with VLong
> h1. This speed up search process ?
> No!!! It can be advanceShallow when search, but why not speed up yet? Because
> the skip list data after docs(see the file described before), it must iterate
> all docs before read skip list data, so it never speed up search time. it has
> no "skipOffset" to direct read skip list data, but as mentioned before, it is
> For debugging, curiosity, transparency only!! If this is a problem, may be
> the next time, i can add "skipOffset", so we can read skip list data directly.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]