[ 
https://issues.apache.org/jira/browse/DIRSTUDIO-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592805#comment-16592805
 ] 

Aaron Burgemeister commented on DIRSTUDIO-1174:
-----------------------------------------------

The replaceAll with newline followed by carriage return followed by a space is, 
in my opinion, an invalid case since no OS does that (or, as far as I know, has 
ever done that):
        s = s.replaceAll( "\n\r ", "" ); //$NON-NLS-1$ //$NON-NLS-2$
The carriage only line is valid for Mac OS 9 and earlier, but I am guessing 
almost nobody runs that anymore since OS X debuted in 2001, and if they do they 
probably cannot get Directory Studio on there.  Still, it's theoretically 
possible somebody could have an old file from there sent to somebody else with 
Directory Studio.  If that is deemed too much of an unlikely scenario, then we 
can take out this line:
        s = s.replaceAll( "\r ", "" ); //$NON-NLS-1$ //$NON-NLS-2$
That leaves the windows carriage return followed by newline abomination, and 
the Linux/Unix/MacOSX/etc. case of a simple newline.  Since all of these calls 
use the String object which is immutable, all of those calls basically recreate 
the String each time, and while the regex part is probably the slow part, the 
recreation of strings of this size probably does not help much either.  It 
would be interesting to see which of the following performed best:
        s = s.replaceAll( "\r?\n ", "" ); //$NON-NLS-1$ //$NON-NLS-2$
vs.
        s = s.replaceAll( "\r", "" ); //$NON-NLS-1$ //$NON-NLS-2$
        s = s.replaceAll( "\n ", "" ); //$NON-NLS-1$ //$NON-NLS-2$
vs.
        s = s.replaceAll( "(?:\r\n)|(?:\n) ", "" ); //$NON-NLS-1$ //$NON-NLS-2$
 

Also, is there a reason we fold the lines in the schema files saved out by 
Directory Studio?  If that is stopped, then a method to read schema files 
without trying to unfold them could be used.  I suspect folding for internal 
use is not that helpful (I personally think folding is not that helpful in 
general unless you really hate long lines, but this isn't meant for humans as 
much as computers), though I am sure we need the current methods to properly 
handle unfolding when getting files from outside Directory Studio.

I went into my 
~/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core
 directory and unfolded the schema files manually and the load time decreased a 
little (two (2) to three (3) seconds), but since the replaceAll calls are still 
in there I would expect even better performance with the changes suggested 
above:
{quote}{{for onefile in *.ldif; do sed -i -n '1 \{h; $ !d}; $ \{x; s/\n //g; 
p}; /^ / \{H; d}; /^ /! \{x; s/\n //g; p}' "${onefile}"; done}}
{quote}
 

> Directory Studio startup very slow due to schema LDIF processing
> ----------------------------------------------------------------
>
>                 Key: DIRSTUDIO-1174
>                 URL: https://issues.apache.org/jira/browse/DIRSTUDIO-1174
>             Project: Directory Studio
>          Issue Type: Bug
>          Components: studio-connection
>    Affects Versions: 2.0.0-M13
>         Environment: openSUSE Linux (installed on my laptop)
> Sun/Oracle Java 1.8.0_111 (previously 1.7 with same issue)
> Apache Directory Studio 2.0.0 M12 and M13, plus earlier milestones too
>            Reporter: Aaron Burgemeister
>            Priority: Major
>              Labels: LDIF, schema, startup-time
>         Attachments: 20180415-no-load-schema-ldif-by-default.patch, 
> 20180416-dirstudio-1174-fix-a.patch, 20180821-schema-analysis-a.csv.bz2, 
> 20180821-schema-analysis-b.csv.bz2, 
> schema-9060594b-7c28-4123-b574-35fe09727283.ldif.bz2
>
>
> For the past couple years startup of Apache Directory Studio has slowed down 
> to the point where it takes more than a minute on my not-a-slouch laptop to 
> start.  Other systems, VMs with new installs, start much faster, even on the 
> same laptop, implying something other than the base product is at fault.  As 
> a result, I had suspected maybe Directory Studio slowed down precipitously 
> due to the number of stored connections, but never confirmed the same.
> Today I connected strace to the 'java' process as it started and noticed the 
> following:
>  
> [pid 30108] *1521902717*.154740 
> open("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core/schema-ba001fb7-4b83-4dca-be44-517c14139f4b.ldif",
>  O_RDONLY) = *-1 ENOENT (No such file or directory)*
> [pid 30108] *1521902717*.154906 
> stat("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core",
>  \{st_mode=S_IFDIR|0755, st_size=5378, ...}) = 0
> [pid 30108] *1521902717*.154948 
> open("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core/schema-95e1202e-9a67-418c-afe9-b02f4e7c06df.ldif",
>  O_RDONLY) = *-1 ENOENT (No such file or directory)*
> [pid 30108] *1521902717*.155019 
> stat("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core",
>  \{st_mode=S_IFDIR|0755, st_size=5378, ...}) = 0
> [pid 30108] *1521902717*.155053 
> open("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core/schema-687f43f6-9d05-4d08-b159-35b0e76dc95a.ldif",
>  O_RDONLY) = *-1 ENOENT (No such file or directory)*
> [pid 30108] *1521902717*.155120 
> stat("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core",
>  \{st_mode=S_IFDIR|0755, st_size=5378, ...}) = 0
> [pid 30108] *1521902717*.155154 
> open("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core/schema-d62d0e10-c81e-4477-81a2-ac2c9e5c7169.ldif",
>  O_RDONLY) = *121*
> [pid 30108] *1521902718*.698702 
> stat("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core",
>  \{st_mode=S_IFDIR|0755, st_size=5378, ...}) = 0
> [pid 30108] *1521902718*.698800 
> open("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core/schema-7b6a9a7c-2192-4b24-8874-1378e5b1b30c.ldif",
>  O_RDONLY) = *126*
> [pid 30108] *1521902719*.770570 
> stat("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core",
>  \{st_mode=S_IFDIR|0755, st_size=5378, ...}) = 0
> [pid 30108] *1521902719*.770660 
> open("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core/schema-b3b02838-067f-4f24-bf92-6bf3fccdbc52.ldif",
>  O_RDONLY) = *127*
> [pid 30108] *1521902721*.198417 
> stat("/home/ab/.ApacheDirectoryStudio/.metadata/.plugins/org.apache.directory.studio.ldapbrowser.core",
>  \{st_mode=S_IFDIR|0755, st_size=5378, ...}) = 0
>  
> Notice the timestamps (bolded near beginning of line) and how they change 
> based on whether or not a schema LDIF file was found (bolded near end of 
> line) and, presumably, processed.  When a file is not found, subsequent files 
> are sought immediately without significantly delaying startup.
> These schema files are all under 1 MiB in size, but most of them are several 
> hundred KiBs, approaching the 1 MiB size, so depending on what Directory 
> Studio is doing as it reads and processes these files, it would seem that 
> this introduces the slowness when a file is found.
> Looking for an existing issue I found DIRSTUDIO-1027 which may be related.  
> During startup of Directory Studio one of my laptop's eight cores is fully 
> utilized, which makes me think this may be more about processing the LDIF 
> than just swapping memory due to inefficient data structures, but I am not a 
> memory management expert, so I only mention the possibility here in case it 
> helps find the root cause quickly.
> My Directory Studio's total startup time: sixty-one (61) seconds.
> Time spent (per strace) reading schema files: fifty-five (55) seconds.
> Estimated non-schema startup time: six (6) seconds.
>  
> Steps to duplicate:
> Have a lot, e.g. 100, of stored schema LDIF files from previous connections.
> Startup Apache Directory Studio.
> Expected results: Startup quickly.  Processing old schema LDIFs, when most of 
> them will not be used at any given time, seems like a waste of time in 
> general.  Perhaps this can be done only when a connection is accessed in some 
> way rather than at startup.
> Actual results: Slow startup.
> Reproducible: I think so, but am not sure why my system has these schema 
> LDIFs when others may not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to