Yes, that's proper way to retrieve all terms.
With best regards,
Alexander Veremyev.
meecect wrote:
This works (defined in segmentInfo.php):
public function getTerms()
{
$result=array();
$tisFile = $this->openCompoundFile('.tis');
$tiVersion = $tisFile->readInt();
if ($tiVersion != (int)0xFFFFFFFE) {
throw new Zend_Search_Lucene_Exception('Wrong TermInfoFile file
format');
}
$termCount = $tisFile->readLong();
$indexInterval = $tisFile->readInt();
$skipInterval = $tisFile->readInt();
$prevTerm = '';
$freqPointer = 0;
$proxPointer = 0;
$indexPointer = 0;
for ($count = 0; $count < $termCount; $count++) {
$termPrefixLength = $tisFile->readVInt();
$termSuffix = $tisFile->readString();
$termFieldNum = $tisFile->readVInt();
$termValue = substr( $prevTerm, 0, $termPrefixLength ) .
$termSuffix;
$docFreq = $tisFile->readVInt();
$freqPointer += $tisFile->readVInt();
$proxPointer += $tisFile->readVInt();
if( $docFreq >= $skipInterval ) {
$skipOffset = $tisFile->readVInt();
} else {
$skipOffset = 0;
}
$result[] = array($termValue);
$prevTerm = $termValue;
}
return $result;
}