Hi Tommy,

On 12/07/2011 12:13 AM, Tommy Chheng wrote:
> This solution is also flawed. Check with Batman_Kane.jpg
>
> I recommend using  org.apache.commons.codec.digest.DigestUtils#md5Hex
> Relying on a commonly used library is a lot less bug prone.
>
>
> On Mon, Dec 5, 2011 at 4:17 PM, Tommy Chheng<[email protected]>  wrote:
>> Thanks to the folks on the wikipedia api mailing list, the problem was
>> that the leading zero was being eaten.

I've already spotted the problem you mentioned and fixed it in our live 
instance which is available at "http://live.dbpedia.org/sparql";.
During its run DBpedia-Live fixes more articles as it encounters them, 
so you will not find foaf:depiction predicate for all articles, but by 
time more and more will have their corresponding foaf:depiction predicates.
We will include that fix also in the next release of DBpedia.
Please, have a look on it and send me any feedback you have about it.

>> This will fix it in ImageExtractor#getImageUrl:
>>
>>      val result = (new BigInteger(1, messageDigest)).toString(16)
>>     val md5 = if (result.length % 2 != 0) "0" + result else result
>>
>> I would submit a patch but i'm unsure how to do so.
>>
>>
>> On Sat, Dec 3, 2011 at 6:38 PM, Tommy Chheng<[email protected]>  wrote:
>>> I'm using ImageExtractor#getImageUrl in the extraction_framework to
>>> get the url of an image.
>>>
>>>         val md = MessageDigest.getInstance("MD5")
>>>         val messageDigest = md.digest(fileName.getBytes)
>>>         val md5 = (new BigInteger(1, messageDigest)).toString(16)
>>>
>>>         val hash1 = md5.substring(0, 1)
>>>         val hash2 = md5.substring(0, 2);
>>>
>>>         val urlPart = hash1 + "/" + hash2 + "/" + fileName
>>>
>>> Most of the time, the function works correctly but on a few cases, it
>>> is incorrect:
>>>
>>> For "Stewie_Griffin.png", I get 2/26/Stewie_Griffin.png but the real
>>> one is 0/02/Stewie_Griffin.png
>>>
>>> The source file info is here:
>>> http://en.wikipedia.org/wiki/File:Stewie_Griffin.png
>>> http://upload.wikimedia.org/wikipedia/en/0/02/Stewie_Griffin.png
>>>
>>> Any ideas why the hashing scheme doesn't work sometimes?
>>>
>>> --
>>> @tommychheng
>>> http://tommy.chheng.com
>>
>>
>> --
>> @tommychheng
>> http://tommy.chheng.com
>
>


-- 
Kind Regards
Mohamed Morsey
Department of Computer Science
University of Leipzig


------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to