Hi Slava,

Slava Pestov wrote:
 > Hi Phil,
 >
 > We don't have this yet. Were you able to do what you wanted to do
 > using mmap alone?
 >

Yes thanks.

Actually in the end I used mmap + some C code - I was writing a suffix 
array and couldn't find a fast way to construct it in factor. I used the 
code below which comes in at about 3 seconds for a 12k text file full of 
\0 terminated strings (the c version called from factor with mmapped 
text file takes 30ms).

Unfortunately the text files I need to index are about a gig and 
mergesort is O(nlogn) - I didn't bother waiting for it to finish on a 
1mb file. Is there anything obvious I can do to speed the factor version up?

Many thanks,

Phil

---------

: pos>substring ( pos symbolsseq -- str )
   2dup 0 -rot index*
   swap subseq >string ; inline

: suffix-compare ( apos bpos symbolsseq -- n )
   [ pos>substring ] curry 2apply <=> ; inline

: sort-suffix-array ( suffixarray symbolsbarray -- sortedsuffixarray )
   [ suffix-compare ] curry sort ;

: generate-suffix-array ( symbolsfname -- array )
   dup file-length tuck [ sort-suffix-array ] with-mapped-file
;

[ "testdata/data.txt" generate-suffix-array ] time

---------

Here's the c code for reference:

------

static int compare_suffix_ptrs(const void *m1, const void *m2) {
   const char **a = (const char **)m1;
   const char **b = (const char **)m2;
   return strcasecmp(*a,*b);
}

void suffix_sort(unsigned int *suffixarray, unsigned int size,
                        const char* strbuffer)
{
    unsigned int i;
    // turn from array of offsets into array of pointers
    for (i=0; i<size;i++) suffixarray[i] += (unsigned int)strbuffer;
    qsort(suffixarray,size,4,compare_suffix_ptrs);
    // turn back into array of offsets
    for (i=0; i<size;i++) suffixarray[i] -= (unsigned int)strbuffer;
}

-------

which is called via:

: generate-suffix-array-c ( symbolsfname -- array )
   [ file-length dup >c-ulong-array swap dupd ] keep
   dup file-length [ mapped-file-address suffix_sort ] with-mapped-file
;




-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Factor-talk mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/factor-talk

Reply via email to