On Wed, 25 Aug 2010 13:13:21 +0200 Julien Valroff <jul...@kirya.net> wrote:
> Hi Stevan, > Hello Julien, > Le mercredi 25 août 2010 à 10:20:48 (+0200), Stevan Bajić a écrit : > > The hash driver has one issue and that is that it is not portable between > > 32bit and 64bit. > > So you need to use a 64bit system if the css file was created on a 64bit > > installation. > > I know understand what happened! I knew that but haven't tought to it when > testing. > The ultra big problem with the Hash driver is that everything related to the Hash driver is memory related. The driver has a structure and saves that structure on disk and loads it from disk. Now the problem is that some values have different lenght depending on the amount of bits the system uses. Some values are just 4 bytes bits long on 32 bit systems while the same value is 8 bytes long on 64 bit systems. Usually this is not a problem but since that structure is saved to the disk without any transformation the saved file/structure breaks if you create the file on 32 bit and then read it on a 64 bit system or the other way around. The other problem with the Hash driver is that the driver does NOT know what bit lenght was used when the file (CSS file) was saved. So when the Hash driver loads a CSS file it expects a certain structure (depending on the bit depth of the current system) and then fails if that structure is different. > > However... the tools > > and some part of the hash driver is broken IMHO. > > Given Eric's and Eugene's information, yes, it seems… > The problem was introduced in 3.9.0 when Sensory Networks merged a patch that does some tricks to have encapsulated in the spam/ham counters another counter that is used to purge tokens. The idea is okay but the problem is that the implementation of that trick is not big-/little endian save. > > I quickly did a patch for cssstat on my 64bit system and I get the cssstats > > back to work without a > > segmentation fault: > [...] > > I will try to make the other css tools to work again but as said before: > > this will not fix the fundamental > > issue with 32/64 bit. > > I think the most important is to get things work. > What do you consider as "to get things work"? IMHO a working Hash driver is a driver that works with the same file on 32bit and on 64bit. And a driver that works with the same CSS file regardless where it was created. So to summ it up: the driver should be bit independent and big-/little endian independent. > As for the architecture dependant css files, I do not think it's a big > problem. > For most users you are right. But what about those using Mac OS X on PPC and then switching to Mac OS X Intel? Their CSS file will not work after moving from PPC to Intel, regardless if they moved the bit depth too. > It should however be clear for the new users when choosing a backend, hence > it must be clearly > stated in the documentation - it may already be the case, I haven't checked > yet - if not, I can write > a small warning about this in the README file. > I don't know if this is inside the README? I do so much work that I focused mostly on the code and have not done much documentation. English is not my native language and I expected the other admins and the users out there to fill in that part. Unfortunately this has not happened. > Can a css file created on 32 bit architecture work on 64 bit arch? > With the current implementation? NO! > > Looking at the current code I am not sure if the fixes will work with > > already broken css files? I mean: I > > don't know if fixing the hash driver will force users to recreate their css > > file and start from scratch? > > In case it doesn't work, would that be a big work to write a new tool like > cssrepair > (I mean in middle or long term)? > That cssrepair would be ultra hard to make. The problem is that one has to guess a lot of things in order to make a cssrepair. Just alone from the CSS file you can almost not say with confidence what is wrong with the CSS file and what not. To be able to do that one would need to extend the CSS file and add some kind of information about it's structure (so that cssrepair could use that info). Even better would be to add some kind of tags inside the CSS file so that even if the information abou the structure is missing inside the CSS file an cssrepair tool could traverse the CSS file and search for those tags and try to fix things. > Again, it would already be a very good thing to fix the hash driver. > The question is what you mean with "fix". Getting the driver to work for Eric is not such a big task. In fact I already have that part fixed in a local branch of DSPAM (I have showed in a message before that this part is working). But getting the driver to be bit save is another issue. > Documentation is also the key there - everything should be clearly stated in > the README and in the release > notes. > Well... the documentation is quickly fixed but that will not prevent users running into the segmentation fault. Way better would be to NOT crash, regardless of what the README says. > > Would that be an issue or should I just go on and fix things and not care > > about the compatibility? > > No, I think compatibility is less important than fixing things, though, in an > ideal world, > Well... since I have different needs for DSPAM I have started some while ago a local fork of DSPAM where I implement stuff that I always wanted to have into DSPAM. One of them is to use fixed length types. I have a local working version that has the exactly same structure on 32/64 bit for the Hash driver. My local version DOES NOT fix the issue with the CSS file from Eric. But what my local version is capable to do is to trigger an error message if the structure is not the way the Hash driver is expecting it. So I don't get a segmentation fault. BUT my local version does not work too with Eric's file. However... I am pretty sure that files created with my local modified Hash driver will not have the issues Eric is reporting. But I really need more testing since I have changed a lot of code and had not enough time to test everything. You will now for sure ask me to commit those changes. But you know what? It is huge! I almost had to change every file in DSPAM. Commiting something like that would be a huge task. I don't know if this would be okay for the 3.9.x release? And I have not fixed everything nor have I checked all the storage backends. I can say that MySQL driver is working and I can say that the structure of the Hash driver is the same on 32bit and on 64bit. I still have not fixed the endianness of the Hash driver. Next is to work on the PostgreSQL driver and then after that on the SQLite 2/3 dirvers. After that I will go on and fix the endianness of the Hash driver. My goal is to increase the stability of DSPAM and I need/want the code to be predictable (and currently it is not that much predictable when you look ath the Hash driver). I need a good fundament before I even can think of extendig DSPAM. And I really need to extend it. I must have a good foundation because I want to extend the DLMTP protocol to allow me to decouple the Web-UI from the system where DSPAM is running and I need to add additional features. > Cheers, > Julien > -- Kind Regards from Switzerland, Stevan Bajić > -- > Julien Valroff <jul...@kirya.net> > http://www.kirya.net > GPG key: 1024D/9F71D449 > 17F4 93D8 746F F011 B845 9F91 210B F2AB 9F71 D449 ------------------------------------------------------------------------------ Sell apps to millions through the Intel(R) Atom(Tm) Developer Program Be part of this innovative community and reach millions of netbook users worldwide. Take advantage of special opportunities to increase revenue and speed time-to-market. Join now, and jumpstart your future. http://p.sf.net/sfu/intel-atom-d2d _______________________________________________ Dspam-devel mailing list Dspam-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-devel