Re: [sqlite] BLOB data performance?
Hi all, I don't want to drive anyone away from SQLite (don't think that I can anyway :-)) but a good solution for storing large amounts of data is HDF5. HTH -- ds 2007/11/15, Roger Binns <[EMAIL PROTECTED]>: > > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Asif Lodhi wrote: > > Interestingly, Microsoft's SourceSafe (at least VS-6.0) apparently > > uses file system > > It basically uses a whole bunch of directories and uses a scheme very > similar to RCS to store the versioning content. > > > while SVN uses Berkeley DB as I read once. > > SVN initially only had Berkeley DB and it drove people nuts. In > particular it used to keep getting wedged and required manual > administrator intervention to fix. See this and the two following > questions: > > http://subversion.tigris.org/faq.html#stuck-bdb-repos > > SVN added a filesystem based backend using a directory to store the > deltas for each revision and that is by far the most popular. > > The moral of the tale is to make sure your backend database library > never needs human attention. I always wondered why they didn't use > SQLite. > > Roger > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.6 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFHO9QNmOOfHg372QQRAh5vAKC4zRw0Uwq7Og8aKNLrIWiIE0XpRgCfashV > LgC0Y4jomgU+o7SXh8xHE6M= > =cVgu > -END PGP SIGNATURE- > > > - > To unsubscribe, send email to [EMAIL PROTECTED] > > - > > -- What is the difference between mechanical engineers and civil engineers? Mechanical engineers build weapons civil engineers build targets.
Re: [sqlite] BLOB data performance?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Asif Lodhi wrote: > Interestingly, Microsoft's SourceSafe (at least VS-6.0) apparently > uses file system It basically uses a whole bunch of directories and uses a scheme very similar to RCS to store the versioning content. > while SVN uses Berkeley DB as I read once. SVN initially only had Berkeley DB and it drove people nuts. In particular it used to keep getting wedged and required manual administrator intervention to fix. See this and the two following questions: http://subversion.tigris.org/faq.html#stuck-bdb-repos SVN added a filesystem based backend using a directory to store the deltas for each revision and that is by far the most popular. The moral of the tale is to make sure your backend database library never needs human attention. I always wondered why they didn't use SQLite. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHO9QNmOOfHg372QQRAh5vAKC4zRw0Uwq7Og8aKNLrIWiIE0XpRgCfashV LgC0Y4jomgU+o7SXh8xHE6M= =cVgu -END PGP SIGNATURE- - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] BLOB data performance?
Hi, On 11/15/07, Andreas Volz <[EMAIL PROTECTED]> wrote: > Am Tue, 13 Nov 2007 12:46:11 -0800 (PST) schrieb Ken: > > > I think your blob file performance may greatly depend upon the file > > system that it used and the workload. > > > > I found this article: > > > > http://oss.sgi.com/projects/xfs/papers/filesystem-perf-tm.pdf > > Very interesting document. But I couldn't know and rely on what > filesystem the user has. There should be another way to ensure a good > file access speed. Interestingly, Microsoft's SourceSafe (at least VS-6.0) apparently uses file system while SVN uses Berkeley DB as I read once. Both products have a common application but use different storage mechanism. That boils down to what algorithm you are using for your particular problem. Just my two cents. -- Asif - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] BLOB data performance?
Andreas Volz wrote: Am Tue, 13 Nov 2007 15:15:49 -0600 schrieb John Stanton: You might find the method used by Squid to manage its cache would be worth emulating. I don't know how squid works. Could you explain it in simple steps? I haven't looked at the code, but it builds a tree of directories so that no directory is excessively loaded. You can download the source and take a look. Using TransmitFile on Windows or sendfile on Unix to despatch the file to the network is by far the most efficient way to pass on files from a cache. It avoids a number of levels of buffer shadowing. Hm, I looked into the manpage. I get the webpage with curl direct into memory or into a file. I don't copy files later on harddisk. I would assume curl does it the most effective way available. Sendfile and TransmitFile basically post the file direct to the network interface without going through multiple layers of intermediate buffering as happens if you use send. I find using that interface in a web server results in very low overhead and consequently a large traffic handling ability. regards Andreas - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] BLOB data performance?
Am Tue, 13 Nov 2007 12:46:11 -0800 (PST) schrieb Ken: > I think your blob file performance may greatly depend upon the file > system that it used and the workload. > > I found this article: > > http://oss.sgi.com/projects/xfs/papers/filesystem-perf-tm.pdf Very interesting document. But I couldn't know and rely on what filesystem the user has. There should be another way to ensure a good file access speed. regards Andreas - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] BLOB data performance?
Am Tue, 13 Nov 2007 15:15:49 -0600 schrieb John Stanton: > You might find the method used by Squid to manage its cache would be > worth emulating. I don't know how squid works. Could you explain it in simple steps? > Using TransmitFile on Windows or sendfile on Unix to despatch the > file to the network is by far the most efficient way to pass on files > from a cache. It avoids a number of levels of buffer shadowing. Hm, I looked into the manpage. I get the webpage with curl direct into memory or into a file. I don't copy files later on harddisk. I would assume curl does it the most effective way available. regards Andreas - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] BLOB data performance?
You might find the method used by Squid to manage its cache would be worth emulating. Using TransmitFile on Windows or sendfile on Unix to despatch the file to the network is by far the most efficient way to pass on files from a cache. It avoids a number of levels of buffer shadowing. Andreas Volz wrote: Am Tue, 13 Nov 2007 07:18:19 -0600 schrieb John Stanton: In a cache situation I would expect that keeping the binary data in files would be preferable because you can use far more efficient mechanisms for loading them into your cache and in particular in transmitting them downstream. Your DB only needs to store a pathname. Just be wary of directory size, and do not put them all in the one directory. I noticed that problem in my current situation. I don't know the file number and size limit in Linux or Windows, but I'm sure there is a limit. My main problem is to find a good algorithm to name the cached files and split them into directories. My current idea is: 1) Put the URL into DB 2) Use a hash function to create a unique name for the cache file 3) Insert the hash name into the same row as the URL The problem with many files in a directory: 4) Use e.g. 'modulo 11' on the URL hash value to get one of ten directory names where to find a file. But this has the drawback to have a static number of cache directories. The algorithm isn't scalable with growing files. Do you think is a good way? Or do you've another idea? regards Andreas - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] BLOB data performance?
I think your blob file performance may greatly depend upon the file system that it used and the workload. I found this article: http://oss.sgi.com/projects/xfs/papers/filesystem-perf-tm.pdf Andreas Volz <[EMAIL PROTECTED]> wrote: Am Tue, 13 Nov 2007 07:18:19 -0600 schrieb John Stanton: > In a cache situation I would expect that keeping the binary data in > files would be preferable because you can use far more efficient > mechanisms for loading them into your cache and in particular in > transmitting them downstream. Your DB only needs to store a pathname. > > Just be wary of directory size, and do not put them all in the one > directory. I noticed that problem in my current situation. I don't know the file number and size limit in Linux or Windows, but I'm sure there is a limit. My main problem is to find a good algorithm to name the cached files and split them into directories. My current idea is: 1) Put the URL into DB 2) Use a hash function to create a unique name for the cache file 3) Insert the hash name into the same row as the URL The problem with many files in a directory: 4) Use e.g. 'modulo 11' on the URL hash value to get one of ten directory names where to find a file. But this has the drawback to have a static number of cache directories. The algorithm isn't scalable with growing files. Do you think is a good way? Or do you've another idea? regards Andreas - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] BLOB data performance?
Regarding: The problem with many files in a directory: Another possible algorithm: Simply name the blob using the ROWID, e.g. Image0783.png In ancient days, a DOS directory of several hundred files might bog things down, but you can put thousands into a modern O.S.'s directory if necessary. So even though you can't predict the exact size of the customer's data, you can probably get the maximum order of magnitude -- and that's all you need. For example, you could create a two level directory structure of 100 directories as: /0/0/ /0/1/ /0/2/ etc., up to /9/9 Then, a blob with ROWID of "783" could be stored in directory /8/3/. (Alternately, if you want to allow for future three-level, four-level, etc, you could store it in /3/8/ with the first directory chosen for the LEAST significant digit, etc. and get better dispersion.) - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] BLOB data performance?
Am Tue, 13 Nov 2007 07:18:19 -0600 schrieb John Stanton: > In a cache situation I would expect that keeping the binary data in > files would be preferable because you can use far more efficient > mechanisms for loading them into your cache and in particular in > transmitting them downstream. Your DB only needs to store a pathname. > > Just be wary of directory size, and do not put them all in the one > directory. I noticed that problem in my current situation. I don't know the file number and size limit in Linux or Windows, but I'm sure there is a limit. My main problem is to find a good algorithm to name the cached files and split them into directories. My current idea is: 1) Put the URL into DB 2) Use a hash function to create a unique name for the cache file 3) Insert the hash name into the same row as the URL The problem with many files in a directory: 4) Use e.g. 'modulo 11' on the URL hash value to get one of ten directory names where to find a file. But this has the drawback to have a static number of cache directories. The algorithm isn't scalable with growing files. Do you think is a good way? Or do you've another idea? regards Andreas - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] BLOB data performance?
In a cache situation I would expect that keeping the binary data in files would be preferable because you can use far more efficient mechanisms for loading them into your cache and in particular in transmitting them downstream. Your DB only needs to store a pathname. Just be wary of directory size, and do not put them all in the one directory. Andreas Volz wrote: Hello, I'll tell you my current situation. I implemented a web cache function for images and other data in my application. In the past I saved the data on the hard disk with a special name and had a text file with the relation "cache file name <-> url". But I didn't like it. Now I like to evaluate sqlite as solution. So my question is about the binary data. Is it better to insert the images and other media data (e.g. videos with < 10 MB of size) into the DB or only a "pointer" to a file laying around on my hard disk? I would estimate a maximum DB size of several hundred MB. How good/bad is reading/writing this data into a BLOB compared to write it as file beside the DB and write only a small name into the DB? Where is the difference between both ways regarding memory and CPU usage? BTW: My current use case writes data slow, but reads data fast. Reading BLOB's must be as fast as reading on the hard disk. regards Andreas - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] BLOB data performance?
Am Mon, 12 Nov 2007 23:41:59 + schrieb [EMAIL PROTECTED]: > Andreas Volz <[EMAIL PROTECTED]> wrote: > > Hello, > > > > I'll tell you my current situation. I implemented a web cache > > function for images and other data in my application. In the past I > > saved the data on the hard disk with a special name and had a text > > file with the relation "cache file name <-> url". But I didn't like > > it. Now I like to evaluate sqlite as solution. > > > > So my question is about the binary data. Is it better to insert the > > images and other media data (e.g. videos with < 10 MB of size) into > > the DB or only a "pointer" to a file laying around on my hard disk? > > I would estimate a maximum DB size of several hundred MB. > > > > How good/bad is reading/writing this data into a BLOB compared to > > write it as file beside the DB and write only a small name into the > > DB? Where is the difference between both ways regarding memory and > > CPU usage? > > > > BTW: My current use case writes data slow, but reads data fast. > > Reading BLOB's must be as fast as reading on the hard disk. > > > > In my studies, BLOB I/O is faster than disk I/O for BLOBs of about > 50KiB or less on Linux. Disk I/O is faster for larger BLOBs. I > have received reports that the transition threshold is about 14KiB > on win32. In my experiements, BLOB I/O is about 10% slower than > direct disk I/O for multi-megabyte blobs. Less than 50 kiB would be nice at least for images and HTML files. I'll give it a try and do my own experiments. Thanks for your experience so far. regards Andreas - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] BLOB data performance?
Andreas Volz <[EMAIL PROTECTED]> wrote: > Hello, > > I'll tell you my current situation. I implemented a web cache function > for images and other data in my application. In the past I saved the > data on the hard disk with a special name and had a text file with the > relation "cache file name <-> url". But I didn't like it. Now I like to > evaluate sqlite as solution. > > So my question is about the binary data. Is it better to insert the > images and other media data (e.g. videos with < 10 MB of size) into the > DB or only a "pointer" to a file laying around on my hard disk? I would > estimate a maximum DB size of several hundred MB. > > How good/bad is reading/writing this data into a BLOB compared to write > it as file beside the DB and write only a small name into the DB? Where > is the difference between both ways regarding memory and CPU usage? > > BTW: My current use case writes data slow, but reads data fast. Reading > BLOB's must be as fast as reading on the hard disk. > In my studies, BLOB I/O is faster than disk I/O for BLOBs of about 50KiB or less on Linux. Disk I/O is faster for larger BLOBs. I have received reports that the transition threshold is about 14KiB on win32. In my experiements, BLOB I/O is about 10% slower than direct disk I/O for multi-megabyte blobs. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -