[sqlite] Avoid duplicate sets with join table

R Smith Sat, 16 Apr 2016 02:53:35 +0200

I would simply use a fourth table that is essentially itself just an 
Index, say:
CREATE TABLE records_features_u (
   id_r INT NOT NULL,
   combo_features TEXT NOT NULL,
   PRIMARY KEY (id_r, combo_features)
) WITHOUT ROWID;


IF SUCCEED(
     INSERT INTO combo_features_u 10, "20,21,22";
) THEN {
   INSERT INTO records_features VALUES (10,20), (10,21), (10,22);
}

This way you maintain the full linking as planned, and also have a 
unique index to easily match prospective additions against.
Note that your software needs to make sure the combination features 
inserted need to be sorted, since "20,21,22" would not match "22,20,21", 
but it would still be the same feature-set.

Another way would be something like this:

IF NO_ROWS_ARE_RETURNED_FOR (
     SELECT 1 FROM combo_features_u WHERE id_r = 10, AND combo_features 
= "20,21,22";
) THEN {
   INSERT INTO combo_features_u 10, "20,21,22";
   INSERT INTO records_features VALUES (10,20), (10,21), (10,22);
}


Good luck,
Ryan

On 2016/04/16 1:20 AM, Ketil Froyn wrote:
> I have two tables and a join table, in principle like this:
>
> CREATE TABLE records (id INTEGER PRIMARY KEY, data TEXT);
> CREATE TABLE features (id INTEGER PRIMARY KEY, data TEXT UNIQUE);
> CREATE TABLE records_features (id_r INTEGER, id_f INTEGER, ord INTEGER);
>
> A record consists of one or more features, and a feature can exist in
> multiple records. The "ord INTEGER" is there to keep track of the
> order of the features. I want to avoid duplicate feature sets. We can
> assume that the ord values will match for two sets.
>
> So let's say there's a relationship stored in the DB:
>
> INSERT INTO records (data) VALUES("record1");
> => 10
> INSERT INTO features (data) VALUES("feature1");
> => 20
> INSERT INTO features (data) VALUES("feature2");
> => 21
> INSERT INTO features (data) VALUES("feature3");
> => 22
> INSERT INTO records_features (id_r, id_f, ord) VALUES(10,20,0);
> INSERT INTO records_features (id_r, id_f, ord) VALUES(10,21,1);
> INSERT INTO records_features (id_r, id_f, ord) VALUES(10,22,2);
>
> Later, if I come across a new record that has exactly features
> "feature1", "feature2" and "feature3" (IDs 20,21 and 22), I don't want
> to add a new record for this. So given a new set of feature IDs, how
> can I best check if there's another record that has these exact
> features before I insert it?
>
> To be clear, a record with features "feature1", "feature2", "feature4"
> would be ok. So would "feature1", "feature2". Subsets are ok, but not
> exact duplicates.
>
> One way could be to drop the join table and do a string concat of the
> feature IDs in a UNIQUE column in the record table, ie:
>
> CREATE TABLE records (id INTEGER PRIMARY KEY, data TEXT, feature_ids
> TEXT UNIQUE);
>
> INSERT INTO records (data, feature_ids) VALUES("record1", "20,21,22");
>
> but I'd prefer to avoid manually having to parse the IDs out of a
> string. It does appear to be a hairy way to solve what I need though.
>
> A combination of the two would be better. The string "20,21,22" would
> function as a unique key avoiding duplicates, and if I also keep the
> join table I can still query the database without parsing strings. I
> won't be doing many updates, so I'm not really concerned about editing
> these strings.
>
> But I'm sure there's a better way. Do I need to use a CTE for this?
>
> -Ketil
> _______________________________________________
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Avoid duplicate sets with join table

Reply via email to