Hello,

Artyom wrote:

>> 3. take the first 4 or 8 bytes of the hash and use it as a
>> key for the 
>> new record.
> 
> And manage collisions manually.

Yes. I do not consider it to be a problem (it is neither a programming 
problem nor a performance problem).

>> as commit identifiers in Git, for example.
>>
> 
> This is great example. When I work with SVN I can remember quite 
> easily,

Which is a bad example, since in your web application you will rather 
not have to remember links with IDs pointing to particular data items. 
Instead, you will just embed these links in the page content.

Actually, there should be no reason to remember any link bigger than the 
domain name.

> The fix is done in changeset af36bc35fd35bcdf2
> 
> Try to remember it?

You have chosen 8-byte prefix, which in the world of integer IDs would 
possibly correspond to the number with ~20 digits (and if you are *sure* 
that your IDs will be always small, then pick shorter prefix).
I would not like to get into the discussion on which is easier to 
remember, as links are usually passed around without any necessity to 
remember them.

>   svn.boost.org/bugs/ticket/2340
> 
> or 
>  
>   svn.boost.org/bugs/ticket/23589062350234
> 
> I think this is clear.

Both are equally easy to click with the mouse.

> I agree, that in case of YouTube it is good if not the only possible
> solution, however, for vast majority of applications running sequence
> or auto increment is much better and appropriate solution.

Or not.
Consider this example: I log on to your web service and create a ticket 
(or whatever I can create there). I see number 1254. One week later I do 
the same and see number 1255. What does it mean? It means that there are 
no other users and I'm alone.
I can learn a lot about your business just by periodically analysing the 
links on your webpage.
Are you really sure you want to expose this information?

For another example, if I see such simple numbers in URLs, the first 
thing that comes to my mind is what would happen if I put a smaller 
number in the URL. Chances are that I would be able to force the system 
to display the information that does not belong to me.
Are you sure your system is immune to this kind of hack?

Consider now that I log to the site that uses hashes for IDs. There is 
no way to learn anything about the business by just analysing the links 
and it is also much more difficult to hack the system by inventing IDs - 
as inventing or guessing hashes would be rather ineffective.

>> The advantage of this solution is that such IDs are generated in a
>> widely scattered way (seemingly randomized within the target range), 
> 
> This is quite not good. And I explain why. Auto-incremented ids have
> locality according to time line. For example, vast majority of access
> to tickets or to some forum would be localized in some period of time
> (latest) so they would be in same place at disc, so locality would 
> be an advantage.

There is no relation between IDs and the storage locality of records.

(unless you use index-organized tables, where table and index is 
physically the same structure, but I've seen them only in high-end 
databases, which, as I understand, you are not using)

> In summary:
> 
> Not every generic best practice is best to apply everywhere, if I write
> some simple program that uses small database like Sqlite3 as data storage
> and requires some running ids. Sequences or auto-increment are perfect
> solutions for such purposes.

If you have a simple program that uses small database like Sqlite3 as 
data storage, then what is a problem of implementing the "sequence" as a 
simple incremented integer variable that is managed by the program and 
not by the database? In this case the "last_insert_id" is just the value 
that you got after you incremented the counter the last time - I bet it 
would even be faster in operation.

The advantage of database-managed sequences is that they work well with 
transactions and multiple client connections. In the conditions that you 
describe (Sqlite3), these potential advantages are irrelevant.

Regards,

-- 
Maciej Sobczak * www.msobczak.com * www.inspirel.com

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Soci-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/soci-users

Reply via email to