We have a 1 million record index that is about 6GB in size. We build
it in parallel w/out AAF so it's hard to comment on the speed of your
index build. However I will say that I did need to manually patch
Ferret to better handle large indexes.
Here is the diff:
--- /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.4/ext/index.c
+++ index.c
@@ -1375,7 +1375,7 @@
lazy_doc = lazy_doc_new(stored_cnt, fdt_in);
for (i = 0; i < stored_cnt; i++) {
- int start = 0, end, data_cnt;
+ off_t start = 0, end, data_cnt;
field_num = is_read_vint(fdt_in);
fi = fr->fis->fields[field_num];
data_cnt = is_read_vint(fdt_in);
@@ -1449,7 +1449,7 @@
if (store_offsets) {
int num_positions = tv->offset_cnt = is_read_vint(fdt_in);
Offset *offsets = tv->offsets = ALLOC_N(Offset,
num_positions);
- int offset = 0;
+ off_t offset = 0;
for (i = 0; i < num_positions; i++) {
offsets[i].start = offset += is_read_vint(fdt_in);
offsets[i].end = offset += is_read_vint(fdt_in);
@@ -1683,8 +1683,8 @@
int last_end = 0;
os_write_vint(fdt_out, offset_count); /* write shared
prefix length */
for (i = 0; i < offset_count; i++) {
- int start = offsets[i].start;
- int end = offsets[i].end;
+ off_t start = offsets[i].start;
+ off_t end = offsets[i].end;
os_write_vint(fdt_out, start - last_end);
os_write_vint(fdt_out, end - start);
last_end = end;
@@ -4799,7 +4799,7 @@
*
************************************************************************
****/
-Offset *offset_new(int start, int end)
+Offset *offset_new(off_t start, off_t end)
{
Offset *offset = ALLOC(Offset);
offset->start = start;
On Aug 8, 2007, at 4:16 PM, Craig Jolicoeur wrote:
> I have a MySQL table with over 18 million records in it. We are
> indexing about 10 fields in this table with ferret.
>
> I am having problems with the initial building of the index. I
> created
> a rake task to run the "Model.rebuild_index" command in the
> background.
> That process ran fine for about 2.5 days before it just suddenly
> stopped. The log/ferret_index.log file says it got to about 28%
> before
> ending. I'm not sure if the process died because of something on my
> server or because of something related to ferret.
>
> It appears that it will take close to 10 days for the full index to be
> build with rebuild_index? Is this normal for a table of this size?
> Also, is there a way to start where the index ended and update from
> there instead of having to rebuild the entire index from scratch?
> I got
> about 28% of the way through so would like to not have to waste the
> 2.5
> days to rebuild that part again trying to get the full index 100%
> built.
>
> Also, is there a way that I can non-destructive rebuild the index
> since
> it didnt complete 100%? Meaning, can I rebuild it without overwriting
> what is already there? That way I can keep what I have to be searched
> while the rebuild takes place and then move that over the old index?
> I'm not running ferret as a Drb server so I dont know if I can.
>
> Also, is there a faster or better way that I can/should be building
> the
> index? Will I have an issue with the index file sizes with a DB this
> size?
> --
> Posted via http://www.ruby-forum.com/.
> _______________________________________________
> Ferret-talk mailing list
> [email protected]
> http://rubyforge.org/mailman/listinfo/ferret-talk
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk