On Wed, 25 Aug 2010 13:13:21 +0200
Julien Valroff <jul...@kirya.net> wrote:

> Hi Stevan,
> 
Hello Julien,


> Le mercredi 25 août 2010 à 10:20:48 (+0200), Stevan Bajić a écrit :
> > The hash driver has one issue and that is that it is not portable between 
> > 32bit and 64bit.
> > So you need to use a 64bit system if the css file was created on a 64bit 
> > installation.
> 
> I know understand what happened! I knew that but haven't tought to it when 
> testing.
> 
The ultra big problem with the Hash driver is that everything related to the 
Hash driver is memory related. The driver has a structure and saves that 
structure on disk and loads it from disk. Now the problem is that some values 
have different lenght depending on the amount of bits the system uses. Some 
values are just 4 bytes bits long on 32 bit systems while the same value is 8 
bytes long on 64 bit systems. Usually this is not a problem but since that 
structure is saved to the disk without any transformation the saved 
file/structure breaks if you create the file on 32 bit and then read it on a 64 
bit system or the other way around. The other problem with the Hash driver is 
that the driver does NOT know what bit lenght was used when the file (CSS file) 
was saved. So when the Hash driver loads a CSS file it expects a certain 
structure (depending on the bit depth of the current system) and then fails if 
that structure is different.


> > However... the tools 
> > and some part of the hash driver is broken IMHO.
> 
> Given Eric's and Eugene's information, yes, it seems…
>  
The problem was introduced in 3.9.0 when Sensory Networks merged a patch that 
does some tricks to have encapsulated in the spam/ham counters another counter 
that is used to purge tokens. The idea is okay but the problem is that the 
implementation of that trick is not big-/little endian save.


> > I quickly did a patch for cssstat on my 64bit system and I get the cssstats 
> > back to work without a 
> > segmentation fault:
> [...] 
> > I will try to make the other css tools to work again but as said before: 
> > this will not fix the fundamental 
> > issue with 32/64 bit.
> 
> I think the most important is to get things work.
> 
What do you consider as "to get things work"? IMHO a working Hash driver is a 
driver that works with the same file on 32bit and on 64bit. And a driver that 
works with the same CSS file regardless where it was created. So to summ it up: 
the driver should be bit independent and big-/little endian independent.


> As for the architecture dependant css files, I do not think it's a big 
> problem.
>
For most users you are right. But what about those using Mac OS X on PPC and 
then switching to Mac OS X Intel? Their CSS file will not work after moving 
from PPC to Intel, regardless if they moved the bit depth too.


> It should however be clear for the new users when choosing a backend, hence 
> it must be clearly
> stated in the documentation - it may already be the case, I haven't checked 
> yet - if not, I can write
> a small warning about this in the README file.
> 
I don't know if this is inside the README? I do so much work that I focused 
mostly on the code and have not done much documentation. English is not my 
native language and I expected the other admins and the users out there to fill 
in that part. Unfortunately this has not happened.


> Can a css file created on 32 bit architecture work on  64 bit arch?
> 
With the current implementation? NO!


> > Looking at the current code I am not sure if the fixes will work with 
> > already broken css files? I mean: I 
> > don't know if fixing the hash driver will force users to recreate their css 
> > file and start from scratch?
> 
> In case it doesn't work, would that be a big work to write a new tool like 
> cssrepair
> (I mean in middle or long term)?
>
That cssrepair would be ultra hard to make. The problem is that one has to 
guess a lot of things in order to make a cssrepair. Just alone from the CSS 
file you can almost not say with confidence what is wrong with the CSS file and 
what not. To be able to do that one would need to extend the CSS file and add 
some kind of information about it's structure (so that cssrepair could use that 
info). Even better would be to add some kind of tags inside the CSS file so 
that even if the information abou the structure is missing inside the CSS file 
an cssrepair tool could traverse the CSS file and search for those tags and try 
to fix things.


> Again, it would already be a very good thing to fix the hash driver.
> 
The question is what you mean with "fix". Getting the driver to work for Eric 
is not such a big task. In fact I already have that part fixed in a local 
branch of DSPAM (I have showed in a message before that this part is working). 
But getting the driver to be bit save is another issue.


> Documentation is also the key there - everything should be clearly stated in 
> the README and in the release
> notes.
> 
Well... the documentation is quickly fixed but that will not prevent users 
running into the segmentation fault. Way better would be to NOT crash, 
regardless of what the README says.


> > Would that be an issue or should I just go on and fix things and not care 
> > about the compatibility?
> 
> No, I think compatibility is less important than fixing things, though, in an 
> ideal world, 
> 
Well... since I have different needs for DSPAM I have started some while ago a 
local fork of DSPAM where I implement stuff that I always wanted to have into 
DSPAM. One of them is to use fixed length types. I have a local working version 
that has the exactly same structure on 32/64 bit for the Hash driver. My local 
version DOES NOT fix the issue with the CSS file from Eric. But what my local 
version is capable to do is to trigger an error message if the structure is not 
the way the Hash driver is expecting it. So I don't get a segmentation fault. 
BUT my local version does not work too with Eric's file. However... I am pretty 
sure that files created with my local modified Hash driver will not have the 
issues Eric is reporting. But I really need more testing since I have changed a 
lot of code and had not enough time to test everything.

You will now for sure ask me to commit those changes. But you know what? It is 
huge! I almost had to change every file in DSPAM. Commiting something like that 
would be a huge task. I don't know if this would be okay for the 3.9.x release? 
And I have not fixed everything nor have I checked all the storage backends. I 
can say that MySQL driver is working and I can say that the structure of the 
Hash driver is the same on 32bit and on 64bit. I still have not fixed the 
endianness of the Hash driver. Next is to work on the PostgreSQL driver and 
then after that on the SQLite 2/3 dirvers. After that I will go on and fix the 
endianness of the Hash driver.

My goal is to increase the stability of DSPAM and I need/want the code to be 
predictable (and currently it is not that much predictable when you look ath 
the Hash driver). I need a good fundament before I even can think of extendig 
DSPAM. And I really need to extend it. I must have a good foundation because I 
want to extend the DLMTP protocol to allow me to decouple the Web-UI from the 
system where DSPAM is running and I need to add additional features.


> Cheers,
> Julien
> 
-- 
Kind Regards from Switzerland,

Stevan Bajić


> -- 
> Julien Valroff <jul...@kirya.net>
> http://www.kirya.net
> GPG key: 1024D/9F71D449
> 17F4 93D8 746F F011 B845  9F91 210B F2AB 9F71 D449

------------------------------------------------------------------------------
Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
Be part of this innovative community and reach millions of netbook users 
worldwide. Take advantage of special opportunities to increase revenue and 
speed time-to-market. Join now, and jumpstart your future.
http://p.sf.net/sfu/intel-atom-d2d
_______________________________________________
Dspam-devel mailing list
Dspam-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-devel

Reply via email to