Re: [Bacula-devel] Query changes in the catalog browser and indexes

Marc Cousin Sun, 26 Aug 2007 04:24:52 -0700

On Sunday 26 August 2007 00:52:33 demerphq wrote:
> On 8/25/07, Dan Langille <[EMAIL PROTECTED]> wrote:
> > On 25 Aug 2007 at 10:51, Eric Bollengier wrote:
> > > On Saturday 25 August 2007 10:31:04 you wrote:
> > > > Hello Marc,
> > > >
> > > > Thanks for the information.  I think this is something that is
> > > > probably well worth the effort to put in the core Bacula code.  I'm
> > > > going to study it a bit to see how it might be organized into a
> > > > project.  Code like this might also help a lot for the simple
> > > > restores that Bacula currently does by speeding up the retrieval.
> > > >
> > > > Perhaps Eric could send me the SQL that creates the tables so that I
> > > > can see more clearly all the fields.
> > >
> > > Sql definitions :
> > >
> > >     CREATE TABLE brestore_knownjobid
> > >     (
> > >      JobId int4 NOT NULL,
> > >      CONSTRAINT brestore_knownjobid_pkey PRIMARY KEY (JobId)
> > >     );
> > >
> > >    CREATE TABLE brestore_pathhierarchy
> > >    (
> > >      PathId int4 NOT NULL,
> > >      PPathId int4 NOT NULL,
> > >      CONSTRAINT brestore_pathhierarchy_pkey PRIMARY KEY (PathId)
> > >    );
> > >
> > >    CREATE INDEX brestore_pathhierarchy_ppathid
> > >                           ON brestore_pathhierarchy (PPathId);
> > >
> > >     CREATE TABLE brestore_pathvisibility
> > >     (
> > >       PathId int4 NOT NULL,
> > >       JobId int4 NOT NULL,
> > >       Size int8 DEFAULT 0,                -- used for statistics
> > >       Files int4 DEFAULT 0,               -- used for statistics
> > >       CONSTRAINT brestore_pathvisibility_pkey PRIMARY KEY (JobId,
> > > PathId) );
> > >
> > >     CREATE INDEX brestore_pathvisibility_jobid
> > >                           ON brestore_pathvisibility (JobId);
> > >
> > >
> > > For example, this is how we list files in a directory :
> > >
> > > SELECT File.FilenameId, listfiles.id, listfiles.Name, File.LStat,
> > > File.JobId FROM
> > >       (SELECT Filename.Name, max(File.FileId) as id
> > >        FROM File, Filename
> > >        WHERE File.FilenameId = Filename.FilenameId
> > >          AND Filename.Name != ''
> > >          AND File.PathId IN (     10,20,20202,20202        )
> > >          AND File.JobId IN (        1,2,3,4           )
> > >        GROUP BY Filename.Name
> > >        ORDER BY Filename.Name) AS listfiles,
> > > File
> > > WHERE File.FileId = listfiles.id
> > >
> > > I think, i will wrote a new class (something like Bvfs) which will
> > > do basic operations like in a real filesystem.
> > >
> > > ch_dir(pathid)
> > >       Change current directory to pathid
> > > ls_dirs()
> > >       List all directories in the current directory (pathid and jobid)
> > > ls_files()
> > >       List all files in a the current directory (pathid and jobid)
> > > up_dir()
> > >       Change to parent directory
> > > pwd()
> > >       Get the current pathid
> > > get_pathid(path)
> > >       Return pathid from a given path
> > > get_root()
> > >       Get root pathid
> > > get_all_file_versions(...)
> > >       Get all versions for a file
> >
> > Can you explain the pros and cons over something like a parent-child
> > relationship?  That each, each file entry has a link to the parent
> > directory.  This would be a self referential table, with some entries
> > being files, some being directories.  This is what I use for the
> > FreshPorts database.
> >
> > See http://news.freshports.org/2007/08/03/freshports-database-primer/
>
> If lookup time is the objective then parent pointers are not the ideal
> way to go, especially in SQL. Look up Celkos algorithms on storing a
> nested heirarchy that allows containment queries using indexes.
> (Nested set representation).
>
> The idea is fairly straight forward, each record has a Left and Right
> value, any children will have left and right values such that P.Left
> <= C.left and C.right <=P.right. An index on both means that full
> heirarchical queries can be done efficiently. The trade off of course
> is that insert/updates are more expensive, however if I understand it
> right the scenario we are discussing is fetch heavy, in which case
> using the nested set representation will probably be a net win.
>
> Just a thought.
>
> Yves


I see what you mean, and it's very interesting if we need to build the whole 
tree or a full branch very fast.

The point is, we don't need to be able to do a full hierarchical query for 
what we are working on. What we need for queries are :
give me the parent dir of current dir
give me all children dirs of current dir.
Neither of those are fetch heavy, we're talking of about 10 to 1000 rows.
In fact, i think it's counterproductive to retrieve all the tree : we waste 
cpu cycles on the director, in the databse server, we waste bandwidth between 
the server and the client, and we waste memory on the client side.

there is no point (I think) in retrieving all the tree of a server, as most 
users will only want to retrieve some very small parts of it (typically 5 to 
10 directories, while going downwards in a precise branch of the tree where 
they know the files are)

both can be done very simply with parent pointers.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Re: [Bacula-devel] Query changes in the catalog browser and indexes

Reply via email to