Thanks to the folks on the wikipedia api mailing list, the problem was
that the leading zero was being eaten.

This will fix it in ImageExtractor#getImageUrl:

    val result = (new BigInteger(1, messageDigest)).toString(16)
    val md5 = if (result.length % 2 != 0) "0" + result else result

I would submit a patch but i'm unsure how to do so.


On Sat, Dec 3, 2011 at 6:38 PM, Tommy Chheng <[email protected]> wrote:
> I'm using ImageExtractor#getImageUrl in the extraction_framework to
> get the url of an image.
>
>        val md = MessageDigest.getInstance("MD5")
>        val messageDigest = md.digest(fileName.getBytes)
>        val md5 = (new BigInteger(1, messageDigest)).toString(16)
>
>        val hash1 = md5.substring(0, 1)
>        val hash2 = md5.substring(0, 2);
>
>        val urlPart = hash1 + "/" + hash2 + "/" + fileName
>
> Most of the time, the function works correctly but on a few cases, it
> is incorrect:
>
> For "Stewie_Griffin.png", I get 2/26/Stewie_Griffin.png but the real
> one is 0/02/Stewie_Griffin.png
>
> The source file info is here:
> http://en.wikipedia.org/wiki/File:Stewie_Griffin.png
> http://upload.wikimedia.org/wikipedia/en/0/02/Stewie_Griffin.png
>
> Any ideas why the hashing scheme doesn't work sometimes?
>
> --
> @tommychheng
> http://tommy.chheng.com



-- 
@tommychheng
http://tommy.chheng.com

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to