Hi Slava,
Slava Pestov wrote:
> Hi Phil,
>
> We don't have this yet. Were you able to do what you wanted to do
> using mmap alone?
>
Yes thanks.
Actually in the end I used mmap + some C code - I was writing a suffix
array and couldn't find a fast way to construct it in factor. I used the
code below which comes in at about 3 seconds for a 12k text file full of
\0 terminated strings (the c version called from factor with mmapped
text file takes 30ms).
Unfortunately the text files I need to index are about a gig and
mergesort is O(nlogn) - I didn't bother waiting for it to finish on a
1mb file. Is there anything obvious I can do to speed the factor version up?
Many thanks,
Phil
---------
: pos>substring ( pos symbolsseq -- str )
2dup 0 -rot index*
swap subseq >string ; inline
: suffix-compare ( apos bpos symbolsseq -- n )
[ pos>substring ] curry 2apply <=> ; inline
: sort-suffix-array ( suffixarray symbolsbarray -- sortedsuffixarray )
[ suffix-compare ] curry sort ;
: generate-suffix-array ( symbolsfname -- array )
dup file-length tuck [ sort-suffix-array ] with-mapped-file
;
[ "testdata/data.txt" generate-suffix-array ] time
---------
Here's the c code for reference:
------
static int compare_suffix_ptrs(const void *m1, const void *m2) {
const char **a = (const char **)m1;
const char **b = (const char **)m2;
return strcasecmp(*a,*b);
}
void suffix_sort(unsigned int *suffixarray, unsigned int size,
const char* strbuffer)
{
unsigned int i;
// turn from array of offsets into array of pointers
for (i=0; i<size;i++) suffixarray[i] += (unsigned int)strbuffer;
qsort(suffixarray,size,4,compare_suffix_ptrs);
// turn back into array of offsets
for (i=0; i<size;i++) suffixarray[i] -= (unsigned int)strbuffer;
}
-------
which is called via:
: generate-suffix-array-c ( symbolsfname -- array )
[ file-length dup >c-ulong-array swap dupd ] keep
dup file-length [ mapped-file-address suffix_sort ] with-mapped-file
;
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Factor-talk mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/factor-talk