Re: Do I want to use SELECTION TO ARRAY instead of GOTO SELECTED RECORD server-side in V17?

Kirk Brooks via 4D_Tech Sat, 15 Sep 2018 10:34:50 -0700

Hey David,

You know, you say that RAM is the primary limiting factor but then also say
you're only interested in actual timing tests. Umm. Seems like the coding
version of the the building contractor talking to a client:


You can have it fast
You can have it cheap
You can have it done well

Pick 2

Sub 'small RAM footprint' for cheap and 'reliable code' for 'done well' and
here we are. ;-)

I haven't done any timing tests of ORDA vs classic (as I call it) 4D. So
I'm rattling off in exactly the way you didn't want. That's what friends
are for.

I suspect ORDA is a bit slower on some things. I also suspect that will
change in coming updates. But a more relevant point I've noticed is ORDA
really seems to be a different internal engine. Consequently I don't think
the existence of ORDA has changed classic 4D ops. I expect something like
GOTO RECORD is doing the same thing in v17 it did before. There might be
some ad hoc optimization but I doubt it's much.

I also doubt there's much of a benefit reaching for one or two ORDA
commands in the midst of another 'classic' set of commands. The reason
being ORDA is using a different approach to manipulating the data. ORDA is
managing references to data while classic more often manages actual data.
GOTO RECORD being a great example. When I hear Thomas Maul talk about
various things being "super optimized" in ORDA I take that to refer to this
aspect of the way it works. To use ORDA commands you have to have an ORDA
data structure. Easy enough to create but these are not the same as the
classic ones (current record, current selection, sets, etc.). So there's
that 'penalty' to pay moving from one approach to the other. This leads me
to conclude I need to make a choice when deciding how a particular process
(not referring to a 4D process, just a segment of code) is going to be
built: classic or ORDA.

So this case sounds like there are really two considerations that impact
RAM - 1) selecting or getting the data and 2) building the text thingy. I
bet you would see less RAM consumed using strictly ORDA commands because
you're going to have millions of references instead of millions of
representations of the data.

Then comes assembling the text and for that I don't think it will matter
how you manage the data. A couple of thoughts:
 - open a disk file and write the text to it
 - write the text to a record

Both these are slower than RAM so maybe you look at the number of records
and use this when it's large and RAM when it's small.

And just to be on the record I know there's no technical reason you can't
mix ORDA and classic commands in the same little chunk of code. But if you
start off using classic 4D there's some point where you need to build an
ORDA representation of that data. At that point you've spent time doing the
transfer and now you essentially have two versions of the same data. For
most things this doesn't matter. For others, like millions of records, I
think it does. Plus the suite of ORDA commands does actually allow you
pretty much all the capabilities we're accustomed to. You just have to
think in that context. Which is a PITA when you've got years of classic 4D
know how.

On Sat, Sep 15, 2018 at 12:08 AM David Adams via 4D_Tech <
4d_tech@lists.4d.com> wrote:

> Short version:
> I need to load some fields from records into a big text thingy.
>
> The code runs on the server-side only.
>
> I'm keen to preserve RAM.
>
> What are the trade-offs in V17 between *GOTO SELECTED* record and
> *SELECTION
> TO ARRAY*? I've been using *SELECTION TO ARRAY*, but it's hard to read,
> write, and maintain. And, I realized, might be de-optimized for memory
> because you have to load all of the data you're processing into arrays.
> (Yes, you can chunk it, but that doesn't change the fundamental point that
> you pre-load a lot of data.)
>
> Any test results or thoughts? I considered a fair range of option and did
> comparison tests on none. The long version below includes more details on
> the two solutions I'm down to, plus the ideas that I discarded.
>
>
> TL;DR version
> I'm working in V17 and I'm hoping that someone has done some real-world
> tests already that could help me out with a question. Here's the setup: I
> need to load up some fields from lots of records and push them into an
> external system. It's going to Postgres, but that's not an important
> detail, the result is a ginormous text object. The result could just as
> well be a text or JSON file dump. The main constraint is available memory.
> Performance matters when there are millions of records but, typically, the
> only important consideration is memory. As far as the final solution goes,
> it's ideally code that's easy to write, read, and maintain. As a plus, we
> can position the code to run server side, so client-server optimization
> isn't an issue. And, for the record, in lots of cases there isn't enough
> data to make memory an issue at all, so readable reliable code is
> definitely a preference.
>
> Note: Yes, I can chunk data in ranges, etc. to keep things within my memory
> footprint. I'm doing that....but the question still remains
>
> Here are the solutions I've come up with:
>
> *QUERY* and a *For* loop with *GOTO SELECTED RECORD*.
> Easy to read, write and maintain. But when you use *GOTO SELECTED RECORD*,
> do you get the whole record in V17? Without fat fields? Since this is
> server-side or stand-alone, should I care? On the upside, you're only
> loading one record at a time, so only burning through memory for that
> record while you use it.
>
> *SELECTION TO ARRAY* and a *For* loop
> This is what I have been doing....based on old habits as much as anything.
> Yes, you only get the columns you want, but it gets _all_ of the rows at
> once. So, you burn up a lot of memory with the arrays and then duplicate++
> that memory when building up the output. On the code side, that kind
> of *SELECTION
> TO ARRAY*-loop-read by index code is ugly, tedious to write, and tedious to
> maintain. It's clear(ish) and reliable, but only worth it if it pays for
> itself somehow. In other words, it has to be a good deal better than *GOTO
> SELECTED RECORD* to be worth it. Says the guy who has been doing all
> *SELECTION
> TO ARRAY* forever.
>
> Entity Selection and a *For* or *For each* loop
> I have no clue why an entity selection is *C_OBJECT* instead of
> *C_COLLECTION*, to give you a sense of how much I know about this stuff. I
> was happy to discover that you can easily create an entity selection from a
> current selection, so old style queries work fine:
>
> *C_OBJECT*($stuff_es)
> *QUERY*([Stuff];[Stuff]Counter>=10000)
> $stuff_es:=*Create entity selection*([Stuff])
>
> The resulting *For*/*For each* loop code is very readable, it's == *GOTO
> SELECTED RECORD*, but with a different syntax. Otherwise, same same. I
> *suspect* that the memory use here is excellent. I'm guessing that as you
> navigate through the entity selection, you're only really pulling the data
> you use. But maybe not. If you do a For each, you get an object (entity)
> with all of the fields. So, possibly this is approach is even worse than
> *GOTO
> SELECTED RECORD* which, I'm guessing, doesn't load as many fields. I
> haven't tested these points out in any way. If anyone has dug into this, it
> would be great to know about the difference (if any) in what 4D loads when
> you:
>
> -- Use *GOTO SELECTED RECORD*
>
> -- Use a *For each* loop on an entity selection, which builds an
> $entity_object which you can then read/write to/from like $entity_object.ID
>
> -- Use a *For* loop on an entity selection and then reference
> $specifc_es[0].ID
>
> It's pretty easy to imagine different ways that 4D might have implemented
> things that are more or less efficient in each of these days. I have no
> idea what they actually did.I'm kind of curious about this behavior in V17,
> but have already talked myself out of using entity selections. Why? Because
> the table and field references are brittle and *case-sensitive*. Man, I
> truly hate case-sensitive names. When do I want them? Never. Not once, and
> I never will. This isn't all on 4D, many languages are case-sensitive. It
> makes sense if you're a computer. I'm not a computer, I'm a person...to me
> its just horrible. Anyway, not exclusively a 4D problem...because in 4D you
> can avoid it altogether.
>
> For those that haven't been following along at home, here's a hello world
> level V17 For each loop over an entity selection:
>
> *C_OBJECT*($stuff_object)
> *For each* ($stuff_object;$stuff_es) // The loop automatically populates
> $stuff_object as it iterates through the list.
> $output_text+output_text+$stuff_object.ID+*Char*(*Carriage return*)
> *End for each*
> See that $stuff_object.ID statement? The ID part = [Stuff]ID. It's all
> case-sensitive. Rename the field in the structure to id three months from
> now and the code above breaks. And for "breaks", you don't get a compiler
> error, you don't get a syntax error in the Method Editor, and you likely
> don't get a runtime error. You code just screws up silently. So, yeah, not
> going that way.
>
> *Note*: Collections are very handy when the source data is a big static
> JSON. It makes the static values highly interactive. I wrote a little
> screen like that last week and loved the results.
>
> *Note*: In a *For each* loop, I can't find a way to read the index of the
> current item. Like, that you're on item 23. You can get the total item
> count with .length, but I see no way to get the current index. Or on
> collections. It can be useful when you've got a progress indicator to
> update. You can always roll your own $index:=$index+1 sort of thing.
> Reminder: All of the new V17 stuff is 0 (offset) indexed, not 1 (position)
> indexed.
>
> Honorable mention: *Selection to JSON*
> Yeah, kind of nice...a very excellent command in some situations. In this
> case, wildly wrong, I'd say. You load the whole JSON in one go so you get
> your source data + formatting + names. It's pretty flabby. Then you have to
> parse and walk that to get the proper text. If 4D had a Selection to text
> (->Table;Template) system that was *not* JSON, I'd be golden. That would be
> perfect. The *Selection to JSON* code doesn't allow in-line functions, so
> there's that. Oh, wait, 4D does have a command like this...*PROCESS 4D
> TAGS*.
> Hmmm. Yeah, probably the best approach for memory and the worst for
> brittleness. Not going there.
>
> Okay, so does anyone have any relevant, V17-based test results yet? I don't
> have the time or appetite to do the tests myself and won't be surprised if
> no one else has either. Not to be a **** about it, but I'm only interested
> in *test results*. It's fun to estimate program behavior from first
> principles, but it has pretty much zero predictive value. Having just spewn
> out a bunch of speculation, I certainly can't hold it against anyone else
> for riffing too.
>
> I've spent some embarrassing number of hours (for hours read "months") of
> my life testing 4D performance and, well, you have to test to find out.
> Conventional wisdom tends to be *worse* than random guessing. It's great to
> hear theories and stories from the folks at 4D, but that's all they
> are...stories and theories. Background information can give you a better
> idea of what to test and where to look, but that's all. Modern machines +
> modern OS + 4D + your code + all of the various subcomponents (RAM,
> network, SSD)...it's a lot. So, it's not a criticism to say only testing
> can hope to turn up meaningful results. Given all of those factors, narrow
> test are ideal and obviously can't be generalized too far. Still, lots
> better than speculation!
>
> Thanks.
> **********************************************************************
> 4D Internet Users Group (4D iNUG)
> Archive:  http://lists.4d.com/archives.html
> Options: https://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
> **********************************************************************



-- 
Kirk Brooks
San Francisco, CA
=======================

*We go vote - they go home*
**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**********************************************************************

Re: Do I want to use SELECTION TO ARRAY instead of GOTO SELECTED RECORD server-side in V17?

Reply via email to