#12157: FileSystemStorage does file I/O inefficiently, despite providing 
options to
permit larger blocksizes
-----------------------------------------------------------+----------------
 Reporter:  alecmuffett                                    |       Owner:  
nobody    
   Status:  new                                            |   Milestone:       
     
Component:  File uploads/storage                           |     Version:  1.1  
     
 Keywords:  io, FileSystemStorage, buffering, performance  |       Stage:  
Unreviewed
Has_patch:  1                                              |  
-----------------------------------------------------------+----------------
 FileSystemStorage contains the following:


 {{{
     def _open(self, name, mode='rb'):
         return File(open(self.path(name), mode))
 }}}


 ..which is used to open files which are stored as FileFields in Django
 models.


 If the programmer decides to hack through the file by using (for instance)
 the django.core.files.base.File.chunks() method:


 {{{
     def chunks(self, chunk_size=None):
         """
         Read the file and yield chucks of ``chunk_size`` bytes (defaults
 to
         ``UploadedFile.DEFAULT_CHUNK_SIZE``).
         """

         if not chunk_size:
             chunk_size = self.__class__.DEFAULT_CHUNK_SIZE

         if hasattr(self, 'seek'):
             self.seek(0)
         # Assume the pointer is at zero...
         counter = self.size

         while counter > 0:
             yield self.read(chunk_size)
             counter -= chunk_size
 }}}



 ...the programmer would expect self.read() - which drops through to
 django.core.files.base.File.read() - to honour its arguments and for the
 I/O to occur in DEFAULT_CHUNK_SIZE blocks, currently 64k; however Dtrace
 shows otherwise:



 {{{
 29830/0xaf465d0:  open_nocancel("file.jpg\0", 0x0, 0x1B6)              = 5
 0
 29830/0xaf465d0:  fstat(0x5, 0xB007DB60, 0x1B6)          = 0
 29830/0xaf465d0:  fstat64(0x5, 0xB007E1E4, 0x1B6)                = 0 0
 29830/0xaf465d0:  lseek(0x5, 0x0, 0x1)           = 0 0
 29830/0xaf465d0:  lseek(0x5, 0x0, 0x0)           = 0 0
 29830/0xaf465d0:  stat("file.jpg\0", 0xB007DF7C, 0x0)          = 0 0
 29830/0xaf465d0:  write_nocancel(0x1, "65536 113762\n\0", 0xD)           =
 13 0
 29830/0xaf465d0:  mmap(0x0, 0x11000, 0x3, 0x1002, 0x3000000, 0x0)
 = 0x7C5000 0
 29830/0xaf465d0:  read_nocancel(0x5, "\377\330\377\340\0", 0x1000)
 = 4096 0
 29830/0xaf465d0:  read_nocancel(0x5,
 "\333\035eS[\026+\360\215Q\361'I\304c`\352\v4M\272C\201\273\261\377\0",
 0x1000)             = 4096 0
 ...
 ...(many more 4kb reads elided)...
 ...
 29830/0xaf465d0:  sendto(0x4, 0x7C5014, 0x10000)                 = 65536 0
 }}}


 ...reading blocks in chunks of 4Kb (on OSX) and writing them in 64Kb
 blocks.


 The reason this is occurring is because "open(self.path(name), mode)" is
 used to open the file, invoking the libc() stdio buffering which is much
 smaller than the 64kb requested by the programmer.


 This can be kludged-around by hacking the open() statement:


 {{{
     def _open(self, name, mode='rb'):
         return File(open(self.path(name), mode, 65536)) # use a larger
 buffer
 }}}


 ...or by not using the stdio file()/open() calls, instead using os.open()

 In the meantime this means that Django is not handling FileSystemStorage
 reads efficiently.

 It is not easy to determine whether this general stdio-buffer issue
 impacts other parts of Django's performance.

-- 
Ticket URL: <http://code.djangoproject.com/ticket/12157>
Django <http://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to django-updates@googlegroups.com
To unsubscribe from this group, send email to 
django-updates+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-updates?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to