I'm still working on my own fix for this date-sorting stuff :-) 

I'm working with an index of about 12000 pages. I want to sort them by
date since it concerns the pages of a newspaper :-)

So, I used Gilles' patch (in combination with snapshot 111598). 



> Memory: Real: 51M/122M act/tot  Virtual: 45M/256M use/tot  Free: 632K
>   PID USERNAME PRI NICE  SIZE   RES STATE   TIME    CPU COMMAND
> 23979 user      42    0   47M   43M WAIT    0:12 17.90% htsearch                     
>      

Size.. 47M (!) ....... free...632K ... *wowie* This happened when I tried
to retrieve ALL documents from the database. (12000). Htsearch isn't able
to sort this much results on date. 
I think time_t (anyway, compareTime) is the problem.
BTW, I pressed CTRL-C when it reached 47M... I'm sure htsearch would
result in a core-dump otherwise too...

When I use htsearch off the prompt, it asks me a 'value for sort'. When I
use 'date' and I search on something what should return about 40
results, htsearch  DOESN'T sort on date! (Does this have to do with the  
snapshot release I use?)


In Display.cc::sort:

   char str[80];
   ResultMatch **array = new ResultMatch*[numberOfMatches];

+  if (numberOfMatches>1000) numberOfMatches=1000;

----

+  for(j=0; j < numberOfMatches; j++)
+  {
+     array[j]->setRef(docDB[array[j]->getURL()]);
+  }  

   matches->Release();

   qsort((char *) array, numberOfMatches, sizeof(ResultMatch *),
          Display::compare); 



In Display.cc::compare:

int
Display::compare(const void *a1, const void *a2)
{
 /* I use this to sort on date.. don't care about Scores or so...*/
    char buffer1[100];
    char buffer2[100];

    ResultMatch *m1 = *((ResultMatch **) a1);
    ResultMatch *m2 = *((ResultMatch **) a2);

    time_t t1 = m1->getRef()->DocTime();
    struct tm *tm1 = localtime(&t1);
    strftime(buffer1,sizeof(buffer1),"%Y%j",tm1);
    time_t t2 = m2->getRef()->DocTime();
    struct tm *tm2 = localtime(&t2);

    strftime(buffer2,sizeof(buffer2),"%Y%j",tm2);

    return (atol(buffer2)-atol(buffer1));
}
 
I know this is an ugly piece of code :-) Don't bother me with that!

What I do here is (maybe stupid) as follows: I take century and
number_of_day_of_the_year. (1998364 for example). Gilles' patch is
better on this I suppose..

Using the routine above, I'm able to QUICKLY sort about 1000 documents by
date. Therefore, I have to build a limit in htsearch so that it can only
display 1000 matches, even if it found 12000... 

The 1000-limit is because time_t eats to much mem (not really sure, but
when I comment it out htsearch doesn't give me a core-dump)


HtDig is great :-)

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.

Reply via email to