I'm still working on my own fix for this date-sorting stuff :-)
I'm working with an index of about 12000 pages. I want to sort them by
date since it concerns the pages of a newspaper :-)
So, I used Gilles' patch (in combination with snapshot 111598).
> Memory: Real: 51M/122M act/tot Virtual: 45M/256M use/tot Free: 632K
> PID USERNAME PRI NICE SIZE RES STATE TIME CPU COMMAND
> 23979 user 42 0 47M 43M WAIT 0:12 17.90% htsearch
>
Size.. 47M (!) ....... free...632K ... *wowie* This happened when I tried
to retrieve ALL documents from the database. (12000). Htsearch isn't able
to sort this much results on date.
I think time_t (anyway, compareTime) is the problem.
BTW, I pressed CTRL-C when it reached 47M... I'm sure htsearch would
result in a core-dump otherwise too...
When I use htsearch off the prompt, it asks me a 'value for sort'. When I
use 'date' and I search on something what should return about 40
results, htsearch DOESN'T sort on date! (Does this have to do with the
snapshot release I use?)
In Display.cc::sort:
char str[80];
ResultMatch **array = new ResultMatch*[numberOfMatches];
+ if (numberOfMatches>1000) numberOfMatches=1000;
----
+ for(j=0; j < numberOfMatches; j++)
+ {
+ array[j]->setRef(docDB[array[j]->getURL()]);
+ }
matches->Release();
qsort((char *) array, numberOfMatches, sizeof(ResultMatch *),
Display::compare);
In Display.cc::compare:
int
Display::compare(const void *a1, const void *a2)
{
/* I use this to sort on date.. don't care about Scores or so...*/
char buffer1[100];
char buffer2[100];
ResultMatch *m1 = *((ResultMatch **) a1);
ResultMatch *m2 = *((ResultMatch **) a2);
time_t t1 = m1->getRef()->DocTime();
struct tm *tm1 = localtime(&t1);
strftime(buffer1,sizeof(buffer1),"%Y%j",tm1);
time_t t2 = m2->getRef()->DocTime();
struct tm *tm2 = localtime(&t2);
strftime(buffer2,sizeof(buffer2),"%Y%j",tm2);
return (atol(buffer2)-atol(buffer1));
}
I know this is an ugly piece of code :-) Don't bother me with that!
What I do here is (maybe stupid) as follows: I take century and
number_of_day_of_the_year. (1998364 for example). Gilles' patch is
better on this I suppose..
Using the routine above, I'm able to QUICKLY sort about 1000 documents by
date. Therefore, I have to build a limit in htsearch so that it can only
display 1000 matches, even if it found 12000...
The 1000-limit is because time_t eats to much mem (not really sure, but
when I comment it out htsearch doesn't give me a core-dump)
HtDig is great :-)
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.