Thanks for having a think about this. I think our problems are related to htmerge but given that I had to make changes to get the code to compile/run on 64 bit Solaris, I was concerned that there may be other 64 bit issues. Our word.db files are getting greater than 2Gb (uncompressed due to earlier encountered zlib errors). When we merger large databases, we are encountering problems where searches on words that return results from the pre-merged database do not return results from the merged database. We only seemed to encounter these issues when we got into the >4Gb memory area.
The reason we are using htmerge is that we need a composite index of content available from both within and external to the university, and also to allow us to spider faster. if you can suggest why we may be having problems we'd be really grateful.
Thanks
Sandy
p.s.
We are using 3.2.0b4-20030126 and are compiling statically.
To make it compile for 64 bit with gcc, I changed a line in include/htconfig.h from:
(I am not a c/c++ developer so these changes are largely based on hunches)
/* Define this to the type of the third argument of getpeername() */ #define GETPEERNAME_LENGTH_T size_t
to:
/* Define this to the type of the third argument of getpeername() */ #define GETPEERNAME_LENGTH_T socklen_t
and in htlib/String.cc (to resolve a segmentation error)
void String::copy_data_from(const char *s, int len, int dest_offset) { memcpy(Data + dest_offset, s, len); }
to:
void String::copy_data_from(const char *s, size_t len, size_t dest_offset)
{
memcpy(Data + dest_offset, s, len);
}
On Monday, July 7, 2003, at 10:10 pm, Geoff Hutchison wrote:
Can anyone comment on what would be required to make the htdig 3.2b4 code 64 bit clean ?
I'm not familiar with issues on Solaris, but ht://Dig has long been "clean" on Alpha systems. So it should be 64-bit clean as-is. The Berkeley DB code is most definitely 64-bit clean and handles databases up to 4TB if you've got the hardware for it.
I am merging large databases, and the htmerge requires >4Gb memory so I have had to compile using -m64 (Solaris 8, gcc 3.2.1) but had to make some small changes in the code to make it compile, and to avoid a segmentation error.
OK, what changes did you make exactly? What snapshot did you use? Are you attempting to compile/use shared libraries?
I am indexing > 400,000 pages - any thoughts on whether htdig 3.3 can scale to this ?
I'm curious why this is causing problems. I know people who have 32-bit systems that easily handle 400,000+ pages. How big are your databases exactly? Are your problems limited to htmerge? (In which case, I likely know the problem, and it's not due to 64-bit addressing.)
-Geoff
-- -Geoff Hutchison Williams Students Online http://wso.williams.edu/
-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/ 01
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev
------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01 _______________________________________________ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev