The performance of SQLite doesn't matter is it is restricted to a single worker anyway -- it's definitely not recommended for running in production.
-ash > On 16 Aug 2018, at 08:33, George Leslie-Waksman <waks...@gmail.com> wrote: > > These performance characteristics are metadata database backend dependent > as well. If there are benchmarks, I would hope we look at them across > sqlite, mysql, postgresql, and any other supported backends before we take > action. > > On Thu, Aug 9, 2018 at 12:41 PM Maxime Beauchemin < > maximebeauche...@gmail.com> wrote: > >> The change on perf for the DAG table would be extremely negligible. >> >> Maybe for task_instances (large table with millions of rows, 3 fields >> composite key) it *could* be a decent idea. Though you'd then need to have >> two indexes to store and maintain and we may have to change the code to >> actually use and reference that new more efficient pk in places where it's >> more efficient to use that index (some of it SQLAlchemy would do right out >> of the box). >> >> This mostly affects the index size (btree(id) is much smaller than >> btree(dag_id, task_id, execution_date)), not the key lookup time much as it >> is log(n). We'd still have to use the composite btree when we want to do >> range scans, which we use frequently to get sets of tasks for a dag or >> specific dag task. Since lookups are log(n), and that we need to maintain >> that composite btree anyways for range scans, I don't see where that would >> really help. It would be a better index (less pages, less memory usage, >> ...) if we didn't need that other composite one, which we do. >> >> Max >> >> On Thu, Aug 9, 2018 at 8:05 AM Vardan Gupta <vardangupta...@gmail.com> >> wrote: >> >>> Point well taken on backward compatibility, we will have to take this >>> change very diligently, if implemented. >>> >>> On Thu, Aug 9, 2018 at 7:29 PM Юли Волкова <xnuins...@gmail.com> wrote: >>> >>>> Because in case what you described nothing about backward >> compatibility. >>>> You think what all who use need to change all theirs DAG's? It's very >>>> strange, because you propose one of the most critical change and it >> will >>>> side everyone. If you want id - call it dag_metadata_id and add it. But >>> not >>>> propose change what hasn't backward compatibility. It's to strange. >>>> >>>> On Thu, Aug 9, 2018 at 7:04 AM vardangupta...@gmail.com < >>>> vardangupta...@gmail.com> wrote: >>>> >>>>> >>>>> >>>>> On 2018/08/09 11:55:11, Ash Berlin-Taylor <a...@apache.org> wrote: >>>>>> Absolutely - there will still need to be a human-readable DAG id, >>> even >>>>> we end up with an auto-icrementing integer ID column internally and >> for >>>>> table join performance reasons. >>>>>> >>>>>> -ash >>>>>> >>>>>>> On 9 Aug 2018, at 12:35, Юли Волкова <xnuins...@gmail.com> >> wrote: >>>>>>> >>>>>>> How will you understand what your DAG 00002 doing enter to it? >> For >>>>> each of >>>>>>> 100, for example? >>>>>>> Especially, if you are not a developer, who create it. You are a >>>>> support >>>>>>> team and have 120 DAGs. >>>>>>> >>>>>>> The first time, when want to also send the answer to dev-mail >> list. >>>>> Please, >>>>>>> don't do it. >>>>>>> >>>>>>> I think it's will be really bad to all who use dag_id like a >> saying >>>>> name of >>>>>>> dag. If I will be looked at 0329313 this does not say anything >>> useful >>>>> for >>>>>>> me and it will be very very complicated to identify for which >>> process >>>>> dag >>>>>>> using. It could be another id for the indexes in DB if it's real >>>>> problem >>>>>>> for somebody. But, please, do not change dag_id. >>>>>>> >>>>>>> On Mon, Aug 6, 2018 at 1:32 AM vardangupta...@gmail.com < >>>>>>> vardangupta...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi Everyone, >>>>>>>> >>>>>>>> Do we have any plan to change type of dag_id from String to >>> Number, >>>>> this >>>>>>>> will make queries on metadata more performant, proposal could be >>>>> generating >>>>>>>> an auto-incremental value in dag table and this id getting used >> in >>>>> rest of >>>>>>>> the other tables? >>>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> Vardan Gupta >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> _________ >>>>>>> >>>>>>> С уважением, Юлия Волкова >>>>>>> Тел. : +7 (911) 116-71-82 <+7%20911%20116-71-82> >>>>>> >>>>>> >>>>> >>>>> Thanks Ash for your reply, I am aligned with what you're saying. >>>>> >>>>> I was not proposing to take away human readable dag_id instead I was >>>>> thinking, why can't we create another field like dag_name which will >>> hold >>>>> this information at all front facing sites while dag_id is changed to >>>>> integer, this will help in making joins work faster in metastore. >>> Though, >>>>> currently dag_id is indexed but still indexing int (4 bytes) vs >>>>> varchar(250) are going to take more index blocks and therefore more >>> look >>>> up >>>>> time. Also, if dag_id is not trivial to change to int, let it be >>> present >>>>> and let's introduce another col which is actually integer in type and >>> let >>>>> joining happen on this column across all tables. >>>>> >>>> >>>> >>>> -- >>>> _________ >>>> >>>> С уважением, Юлия Волкова >>>> Тел. : +7 (911) 116-71-82 <+7%20911%20116-71-82> >>> >>