[ 
https://issues.apache.org/jira/browse/LUCENE-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15509280#comment-15509280
 ] 

Dawid Weiss edited comment on LUCENE-7453 at 9/21/16 9:02 AM:
--------------------------------------------------------------

bq. I think docNum is a good improvement because it makes it sounds like we are 
numbering the documents, not assigning a unique identifier to them.

Sorry, but this explanation is even more controversial and vague to me (what is 
"numbering" of documents?). I'd prefer simply explaining that identifiers are 
persistent within an index segment (because they are), but index segments can 
be merged and thus a document may be moved across index segments over time, 
changing its per-segment identifier. 

If we really wish to make loops like this not use the "id" naming:
{code}
for (int docId = 0, max = indexReader.maxDoc(); docId < max; docId++) {
  // do something
}
{code}

then really {{docNum}} doesn't make it any better. Even {{docIndex}} seems 
better to me; in fact, this "index" makes sense both at segment level (where 
the index doesn't change) and at composite reader level (where the 'index' of a 
document has a more complex semantics). If we make it clear document index is 
volatile and is valid (and constant) only for the a opened reader, then this is 
more clear to me.

{code}
for (int docIndex = 0, max = indexReader.maxDoc(); docIndex < max; docIndex++) {
          // do something
}
{code}




was (Author: dweiss):
bq. I think docNum is a good improvement because it makes it sounds like we are 
numbering the documents, not assigning a unique identifier to them.

Sorry, but this explanation is even more controversial and vague to me than 
(what is "numbering" of documents?). I'd prefer simply explaining that 
identifiers are persistent within an index segment (because they are), but 
index segments can be merged and thus a document may be moved across index 
segments over time, changing its per-segment identifier. 

If we really wish to make loops like this not use the "id" naming:
{code}
for (int docId = 0, max = indexReader.maxDoc(); docId < max; docId++) {
  // do something
}
{code}

then really {{docNum}} doesn't make it any better. Even {{docIndex}} seems 
better to me; in fact, this "index" makes sense both at segment level (where 
the index doesn't change) and at composite reader level (where the 'index' of a 
document has a more complex semantics). If we make it clear document index is 
volatile and is valid (and constant) only for the a opened reader, then this is 
more clear to me.

{code}
for (int docIndex = 0, max = indexReader.maxDoc(); docIndex < max; docIndex++) {
          // do something
}
{code}



> Change naming of variables/apis from docid to docnum
> ----------------------------------------------------
>
>                 Key: LUCENE-7453
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7453
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Ryan Ernst
>
> In SOLR-9528 a suggestion was made to change {{docid}} to {{docnum}}. The 
> reasoning for this is most notably that {{docid}} has a connotation about a 
> persistent unique identifier (eg like {{_id}} in elasticsearch or {{id}} in 
> solr), while {{docid}} in lucene is currently some local to a segment, and 
> not comparable directly across segments.
> When I first started working on Lucene, I had this same confusion. {{docnum}} 
> is a much better name for this transient, segment local identifier for a doc. 
> Regardless of what solr wants to do in their api (eg keeping _docid_), I 
> think we should switch the lucene apis and variable names to use docnum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to