On Tue, Feb 25, 2014 at 12:38 PM, Markus Schaber <m.scha...@codesys.com> wrote: > Hi, > > A coworker just consulted me on a performance problem of IronPython vs. > cPython. > > ... snip ... > > On a closer look, there's the additional (and IMHO much worse) problem that > the update() method seems not to work incrementally: > > private void update(IList<byte> newBytes) { > byte[] updatedBytes = new byte[_bytes.Length + newBytes.Count]; > Array.Copy(_bytes, updatedBytes, _bytes.Length); > newBytes.CopyTo(updatedBytes, _bytes.Length); > _bytes = updatedBytes; > _hash = GetHasher().ComputeHash(_bytes); > } > > In our use-case, this means that every file which is read leads to a > reallocation and copying and recalculation of the MD5 sum of all the data > which was read until now. This is suboptimal from memory and performance > perspective. > > I'm not an expert on the .NET crypto APIs, but I guess there should be some > incremental API available there which could be exploited.
http://ironpython.codeplex.com/workitem/34022 I've also CC'd Emmanuel Chomarat, who was investigating a fix for this. Unfortunately I don't think there's an easy solution based on how the .NET APIs are constructed. Quoting from Emmanuel's email to me a while back: "I am now using TransformBlock / TransformBlockFinal to compute the current hash with a linear complexity ( whereas we had before n**2) but I am still facing an issue. First we need to have a copy operator, this is not possible because we can not share the hash instance between two objects in .net, the only way to make it consistent with what python is doing is by keeping a copy of the full data in MEMORY in order to create a new instance with these data when copy is called. The second thing is that digest can be called several times in python with some new data added/updated to the hash , in C# as soon as TransformBlockFinal has been called once we can not anymore add more data to the stream. Once again I have been able to use the same previous technic but at a memory cost + computation cost if we call serveral times digest/hexdigest. I don't see any to escape this pb with MS api that does not expose internal states as the underlying md5 lib in python does." Basically, there's a mismatch between what .NET provides and what Python needs for perfect compatibility. Keeping all data in memory is not desirable, but neither is failing some operations. And I would *really* prefer not to have to reimplement all of the cryptographic hash functions Python has. One option is to default to not buffering and failing on certain operations, and offer a constructor flag that enables buffering to allow the otherwise-impossible operations. Not my favourite idea, but workable. - Jeff _______________________________________________ Ironpython-users mailing list Ironpython-users@python.org https://mail.python.org/mailman/listinfo/ironpython-users