Hi David, I realize that we're getting off-topic for jakarta commons here ... ;-)
David Castro wrote on Tuesday, May 30, 2006 7:36 AM: > Jörg Schaible wrote: [snip] >> After a quick look over the package you get the impression, >> that you imported the magic codes of file magic into the >> project. And then you're quite astonished, if the library >> does not detect simple formats (e.g. TIFF, Windows BMP), that >> are no problem for the C pendant. This is IMHO a problem, >> I did use the "magic" file to assist in generating the magic.xml file >> bunded with the project. You'll note that I have some, but >> not all of >> the matches cleaned up and working. Actually, the file command will >> sometimes have incorrect matches itself, which I didn't want to >> inherit. So, I started with a small set of documents that I generated >> and ran them through unit tests to verify them. > > I would never be astonished that an alpha piece of open > source software > doesn't work exactly as expected or is limited in it's out-of-box > state. I only moonlight as a open source developer as much > as I'd like > it to be my full-time job ;) I assume most people here are in the same boat including myself. >> because there's simply no documentation, that states >> something else. When I detected jMimeMagic I just thought to >> use it as a black box. > > Yeah, if you are looking for something that doesn't require a bit of > elbow grease, jMimeMagic wouldn't be an optimal solution since it is > early alpha open source software. That's pretty normal I think. And it's pretty normal for users to expect the opposite :D > Nothing else existed out there when I started this project and I only > had so many hours to devote to it. But let's get the engine > revved up > and make it more out-of-the-box-friendly. :) [snip] >> But you could not decide, what you wanted to implement. >> See, file magic has two magic files, one to produce a format >> description and one for the mime type. Your implementation >> mixes the two approaches. > > I decided exactly what I wanted to implement and what I wanted to > prepare for (at least at the time). You're assuming that my intention > was to simply duplicate the "file" utility, which isn't the case. > Determining mime type was really only one of my intentions. Well, by naming you project j*Mime*Magic, you imply something ;-) > More import > to me was actually determining the specific type and state of > content in > a stream of data. It was initially built as a helper library for a > malware detection project. > >> Mime type detection is normally an action that should >> happen *fast*, but if I request the mime type for an MP3 you >> evaluate all the nested matchers that are totally moot for >> the mime type. > > Now you are talking about optimization based on one of the > specific uses > of the library. No, I am talking about your attempt to target two different things at the same time and you cannot do both of it efficiently. > I agree with you that there are some things > that can > certainly be done better/more efficiently. Those need to be > identified and patched, but let's try not to throw the baby out with > the > bath water. Split the result of the parser, create specialized matchers for mime type detection and descriptive format detection. If you have the need to detect a mime type it is typically something you wanna do on the fly - and fast. >> Looking at the code: >> [snip] >> - you're code is linked to Log4J. This is not good for >> libraries. See, some of our customers use completely own >> logging implementations, but with commons-logging you can at >> least write an easy bridge >> > Yup, I agree with you. Nobody has been pounding on the door > asking for > it and I had enough work on other projects to not concern myself too > deeply with it. Demand, demand :) >> - you never guard log.debug with log.isDebug - and you create *a >> lot* of debug output > > Yup, certainly and area for making the library more > efficient. Again, > completely aware of the issue...just haven't fixed it yet. > >> - file magic has also its limits as already explained in >> this thread. You already introduced regexp support, but you >> don't use it properly e.g. for the HTML types so far > > Definitely limits, and as I mentioned I was already moving and have > already coded adjustments to support more of a pluggable matcher > architecture. This is the functionality, that *I* am not that interested in ... the mime type can normally be detected quite easily with the standard patterns. > And if my HTML regex matcher is > broken... Well, you have some of those non-regex, fixed position HTML matching definitions in your magic.xml, that are also present in file magic's definitons and that don't work too well. > please send me a > patch =) I've been calling for folks to help build out a > complete set > of matchers for more content types, but with limited responses. Just to clarify, when I first looked at jMimeMagic, it was just some days before you posted your call for help. So the project looked to me like a lot of other abandonned projects on SF with a single time dump of some experimental code. Therefore I wanna apologize for my overall bad reputation I gave to your project in one of my first postings in this thread. > I usually just scratch my own itches. I've also determined > that I am a > pretty lousy mind reader ;) :) >> OK, some of the problems would have been solved by >> providing an own magic.xml file. E.g. one of my mistakes with >> the library was, that I assumed that the magic file was read >> every time you create a Magic instance and you would have to >> synchronize the initializartion of the instance if you want >> to share it. This assumtion was wrong, but only after looking >> at the code - not by reading the javadocs. > > Yeah...documentation is the first to go =( I try to keep my projects > clean, organized, and as simple as possible though. So if > you browse, > you should get a good feel for what is going on. It's not always > beautiful or elegant, but you shouldn't find any obfuscated code...heh > > Thanks for the feedback. I understand it is aways > frustrating working > with somebody else's code, so I'm sure it was less fun for > you to deal > with jMimeMagic than it typically is for myself. But let's make it > better. I'd love to have other folks to collaborate with on this. As you have seen from all the folks responding to this thread, there is a need for it and people are willing to do something. There's no need to bring it here to Jakarta Commons though, SF is totally fine. - Jörg --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
