Re: [Factor-talk] potential memory issue

2015-10-12 Thread Björn Lindqvist
2015-10-02 20:12 GMT+02:00 HP wei :
> First,
> In factor's listener terminal (not in the gui window, though),
> Jon Harper suggested to hit Control-C and t to terminate
> a long running code.
> I hit Control-C in below case (1), it brings out a low level debugger (what
> a pleasant surprise).
>
> Let me ask a question first before I write more about investigating the
> issue.
> *** in the low-level debugger, one of the commands is 'data' to dump data
> heap.
>  Is there any way to dump the result to a file ??

No. But you can easily log the console output:

./factor -run=readline-listener |& tee -i out.log


> Summary of further investigation.
>
> The code
> 0 "a_path_to_big_folder" x [ link-info dup symbolic-link? [ drop ] [ size>>
> + ] if  ] each-file

I believe this code is a rough example on how to do it. To count disk
usage in a real Linux directory tree is much more involved than
that. You need to account for hard links, virtual file systems, volatile
files and much more. Look at all switches "man du" lists -- it is
complicated.


> (1) when x = t  (breadth-first  BFS)
>  the memory usage reported by linux's  'top' shows steady increase
>  from around 190M to as high as 2GB before either I killed it or it hit
> tge
>  missing file issue.

I don't think you are hitting a missing file issue. In
/proc//fd there is an extra ephemeral file which shows up
because listing the contents of a directory requires opening a file
which creates a file descriptor. You can trigger the same problem in
Python using:

[os.stat(f) for f in os.listdir('/proc/%d/fd' % os.getpid())]


>   But the total-file-size of about 280GB is incorrect.  It should be
> around 74GB.

This could be because the size of /proc files are counted. Especially
the /proc/kcore file is enormous.


> For the above disk,  DFS appears to consume much less memory !
> But the resulting file size is incorrect (280GB instead of 70GB).
> This is presumably due to (NOTE-A) and the code must have scanned through
> those
> OTHER disks.  But then the extra scanning appears to be incomplete!

It's hard to say what might be up. But if the disks are mounted under
the directory you supplied to each-file, then the files on those disks
will be counted.

> In closing,  the simple code (with DFS)
> 0 "a_path_to_big_folder" f [ link-info dup symbolic-link? [ drop ] [
> size>> + ] if  ] each-file
> could NOT achieve the intended action --- to sum up the file-size for files
> residing in a
> disk (as pointed to by a_path_to_big_folder).

That is not surprising. Here is a better method to do it:

USING: accessors combinators.short-circuit continuations
io.directories.search io.files.info io.files.types kernel math
math.order namespaces sets ;

! Filter hardlinks
SYMBOL: seen-inos

: regular-file-size ( file-info -- s )
! In case it's one of the fake huge /proc files
[ size>> ] [ size-on-disk>> ] bi min ;

: count-file-info? ( link -- s )
{
[ type>> +regular-file+ = ]
[
{ [ nlink>> 1 = ] [ ino>> seen-inos get ?adjoin ] } 1||
]
} 1&& ;

: file-info-size ( link -- s )
dup count-file-info? [ regular-file-size ] [ drop 0 ] if ;

: file-size ( path -- s )
[ link-info file-info-size ] [ 2drop 0 ] recover ;

: du-tree ( path -- s )
HS{ } clone seen-inos set
0 swap t [ file-size + ] each-file ;

It gives a decent disk usage counts for me. It underreports the total in
comparison with "du -s --si" because I excluded directory sizes.


--
mvh/best regards Björn Lindqvist

--
___
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk


Re: [Factor-talk] potential memory issue

2015-10-02 Thread HP wei
First,
In factor's listener terminal (not in the gui window, though),
Jon Harper suggested to hit Control-C and t to terminate
a long running code.
I hit Control-C in below case (1), it brings out a low level debugger (what
a pleasant surprise).

Let me ask a question first before I write more about investigating the
issue.
*** in the low-level debugger, one of the commands is 'data' to dump data
heap.
 Is there any way to dump the result to a file ??



Summary of further investigation.

The code
0 "a_path_to_big_folder" x [ link-info dup symbolic-link? [ drop ] [ size>>
+ ] if  ] each-file

(1) when x = t  (breadth-first  BFS)
 the memory usage reported by linux's  'top' shows steady increase
 from around 190M to as high as 2GB before either I killed it or it hit
tge
 missing file issue.

(2) when x = f  (Depth-first DFS)
 Watching RES from 'top', I noticed that
 the memory usage even drops from 190M to around 94M before I went home
 and let the code run in the office.
 The next morning, I found that it finished OK with a total-file-size
on the data stack

  But the total-file-size of about 280GB is incorrect.  It should be
around 74GB.

-

Just a reminder, our disk has the following properties.

   it is a disk with a tree of directories.
   directory count ~ 6000
   total number of files as of now ~ 1.1 million
   total number of softlinks ~ 57
   total file size ~ 70GB

   number of files in each sub-directory (not including the files in
sub-directory inside it)
   range from a few hundreds to as high as of the order of <~10K.

   (NOTE-A) Some of the folders are in fact softlinks that links to "OTHER
disk locations".



For the above disk,  DFS appears to consume much less memory !
But the resulting file size is incorrect (280GB instead of 70GB).
This is presumably due to (NOTE-A) and the code must have scanned through
those
OTHER disks.  But then the extra scanning appears to be incomplete!
Becasue 280GB is too small.  A complete traversing the above disk plus all
those
OTEHR disks will amount to a few Terabytes.

So, somewhere the traversing was screwed up. This may be another
investigation for another day.

--

For the case (1), I did Control-C to bring up the low-level debugger.
And type 'data' to look at the data heap content.
It is a LONG LONG list of stuff containing many tuples describing the
directory-entries.

I type 'c' to let the code continue for a while.
Control-C again.
then 'data' to look at the data heap.
Since the list is TOO long to fit the screen, I could not see any
significant difference
in the last few lines of the output between this 'data' and the last one.

It will be nice to be able to dump the 'data' result to a file.
Then a more comprehensive comparison can be done.

I also tried to type 'gc'  to invoke a round of garbage collecting.
But nothing seems to be affected.  The memory as monitored by 'top'
remains unchanged.



In closing,  the simple code (with DFS)
0 "a_path_to_big_folder" f [ link-info dup symbolic-link? [ drop ] [
size>> + ] if  ] each-file
could NOT achieve the intended action --- to sum up the file-size for files
residing in a
disk (as pointed to by a_path_to_big_folder).

A custom iterator needs to be coded, after all.

Finally, the memory issue in the BFS may be just due to that the algorithm
requires a LOT of
memory to store all directory-entries at a certain depth in the tree.
If we can dump the 'data' content in the debugger to a file, I could see
more clearly
by comparing the content at two distinctive moments (say when RES (from
'top')
reaches 1gb and when it reaches 2gb).

--HP
--
___
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk


Re: [Factor-talk] potential memory issue --- Fwd: how to error trapping 'link-info'

2015-10-01 Thread John Benediktsson
Maybe you can debug a little if you see that happen again?

Perhaps something like this to get the largest number of instances, if
there is a per-file leak:

IN: scratchpad all-instances [ class-of ] histogram-by
sort-values reverse 10 head .

Some other words for inspecting memory:

http://docs.factorcode.org/content/article-tools.memory.html

Can you give us some information about your disk layout?

Is it one big directory with 1 million files?  Is it a tree of
directories?  What do you think is average number of files per-directory?

I opened a bug report if you'd like to provide feedback there rather than
the mailing list:

https://github.com/slavapestov/factor/issues/1483




On Thu, Oct 1, 2015 at 8:38 AM, HP wei  wrote:

> Well, I just checked the running factor session that failed the
> task overnight that I mentioned in below email.
>
> From the linux system command 'top',
> I see that this particular factor is using
> VIRT   4.0g
> RES   2.0g
> %MEM 26%
>
> I clicked on the restart listener button and the numbers remain the same.
> should I have done more to clean up the memory usage ?
>
> --
>
> For comparison, I killed the factor session and restart it from the shell.
> The numbers are
> VIRT  940M
> RES  182M
> %MEM 2.2%
>
> ==> Had the factor continued to run last night,
>it would have probably exhausted the memory on the machine.
>I guess there might be some memory (leak) issue somewhere ???
>
> --HP
>
>
>
> -- Forwarded message --
> From: HP wei 
> Date: Thu, Oct 1, 2015 at 9:36 AM
> Subject: how to error trapping 'link-info'
> To: factor-talk@lists.sourceforge.net
>
>
> As suggested by John, I test out the following action to
> get the total file sizes of a disk volume.
>
> 0 "a_path_to_big_folder" [ link-info dup symbolic-link? [ drop ] [ size>>
> + ] if  ] each-file
>
>
> Our big-folder is on a netapp server shared by tens of people. Many small
> files get updated
> every minutes if not seconds. The update may involve removing the file
> first.
> It has many many subfolders which in turn have more subfolders.
> Each subfolder may have hundreds of files (occasionally in the thousands).
>
> After a few day's discussion with factor guru's, I understand that
> each-file traverses the directory structure by first putting
> entries of a folder in a sequence. And it processes each entry one by one.
> Although this may not cause using big chunk of memory at a time,
> it does have the following issue..
>
> 
>
> Last night, I left the command running and came back this morning to find
> that it failed with the message.
> lstat:  "a path to a file" does not exist !!!
>
> This is because after 'each-file' puts the file into the sequence and then
> when
> it is its turn to be processed, it is not there at the time!!
> Without error trapping, the above "0 ... each-file"  could not work in our
> case.
>
> So, I guess I would need to do error-trapping on the word link-info.
> I do not know how to do it.  Any hint ?
>
> Thanks
> HP
>
>
>
>
> --
>
> ___
> Factor-talk mailing list
> Factor-talk@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/factor-talk
>
>
--
___
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk


[Factor-talk] potential memory issue --- Fwd: how to error trapping 'link-info'

2015-10-01 Thread HP wei
Well, I just checked the running factor session that failed the
task overnight that I mentioned in below email.

>From the linux system command 'top',
I see that this particular factor is using
VIRT   4.0g
RES   2.0g
%MEM 26%

I clicked on the restart listener button and the numbers remain the same.
should I have done more to clean up the memory usage ?

--

For comparison, I killed the factor session and restart it from the shell.
The numbers are
VIRT  940M
RES  182M
%MEM 2.2%

==> Had the factor continued to run last night,
   it would have probably exhausted the memory on the machine.
   I guess there might be some memory (leak) issue somewhere ???

--HP



-- Forwarded message --
From: HP wei 
Date: Thu, Oct 1, 2015 at 9:36 AM
Subject: how to error trapping 'link-info'
To: factor-talk@lists.sourceforge.net


As suggested by John, I test out the following action to
get the total file sizes of a disk volume.

0 "a_path_to_big_folder" [ link-info dup symbolic-link? [ drop ] [ size>> +
] if  ] each-file


Our big-folder is on a netapp server shared by tens of people. Many small
files get updated
every minutes if not seconds. The update may involve removing the file
first.
It has many many subfolders which in turn have more subfolders.
Each subfolder may have hundreds of files (occasionally in the thousands).

After a few day's discussion with factor guru's, I understand that
each-file traverses the directory structure by first putting
entries of a folder in a sequence. And it processes each entry one by one.
Although this may not cause using big chunk of memory at a time,
it does have the following issue..



Last night, I left the command running and came back this morning to find
that it failed with the message.
lstat:  "a path to a file" does not exist !!!

This is because after 'each-file' puts the file into the sequence and then
when
it is its turn to be processed, it is not there at the time!!
Without error trapping, the above "0 ... each-file"  could not work in our
case.

So, I guess I would need to do error-trapping on the word link-info.
I do not know how to do it.  Any hint ?

Thanks
HP
--
___
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk


Re: [Factor-talk] potential memory issue --- Fwd: how to error trapping 'link-info'

2015-10-01 Thread HP wei
Yes, I could find out a bit more about the memory issue.

I tried it again this afternoon.  After 50 minutes into the action
  0 "path" t [ link-info ... ] each-file
the system 'top' shows RES rises above 1.2GB and %MEM becomes 15.7%
and they continue to rise.
It blacks out the gui window of factor.

I try to hit Control-C but it continues to run.
*** How to exit a running words ?

It looks like the only natural way I know of to 'stop' it is to wait for
link-info to hit the missing file scenario --- like the overnight run of
last night.

So, I just killed the factor session from the shell.  And missed the
opportunity
to inspect the memory usage in factor, as John suggested.

Is there a way to exit running words ?
[ perhaps, I need to learn to use a factor-debugger ? ]

-

Replying to John's questions about the disk layout:
   it is a disk with a tree of directories.
   directory count ~ 6000
   total number of files as of now ~ 1.1 million
   total number of softlinks ~ 57
   total file size ~ 70GB

   number of files in each sub-directory (not including the files in
sub-directory inside it)
   range from a few hundreds to as high as of the order of <~10K.

   Some of the directories are constantly updated throughout the day.

--HP



On Thu, Oct 1, 2015 at 12:27 PM, John Benediktsson  wrote:

> Maybe you can debug a little if you see that happen again?
>
> Perhaps something like this to get the largest number of instances, if
> there is a per-file leak:
>
> IN: scratchpad all-instances [ class-of ] histogram-by
> sort-values reverse 10 head .
>
> Some other words for inspecting memory:
>
> http://docs.factorcode.org/content/article-tools.memory.html
>
> Can you give us some information about your disk layout?
>
> Is it one big directory with 1 million files?  Is it a tree of
> directories?  What do you think is average number of files per-directory?
>
> I opened a bug report if you'd like to provide feedback there rather than
> the mailing list:
>
> https://github.com/slavapestov/factor/issues/1483
>
>
>
>
> On Thu, Oct 1, 2015 at 8:38 AM, HP wei  wrote:
>
>> Well, I just checked the running factor session that failed the
>> task overnight that I mentioned in below email.
>>
>> From the linux system command 'top',
>> I see that this particular factor is using
>> VIRT   4.0g
>> RES   2.0g
>> %MEM 26%
>>
>> I clicked on the restart listener button and the numbers remain the same.
>> should I have done more to clean up the memory usage ?
>>
>> --
>>
>> For comparison, I killed the factor session and restart it from the shell.
>> The numbers are
>> VIRT  940M
>> RES  182M
>> %MEM 2.2%
>>
>> ==> Had the factor continued to run last night,
>>it would have probably exhausted the memory on the machine.
>>I guess there might be some memory (leak) issue somewhere ???
>>
>> --HP
>>
>>
>>
>> -- Forwarded message --
>> From: HP wei 
>> Date: Thu, Oct 1, 2015 at 9:36 AM
>> Subject: how to error trapping 'link-info'
>> To: factor-talk@lists.sourceforge.net
>>
>>
>> As suggested by John, I test out the following action to
>> get the total file sizes of a disk volume.
>>
>> 0 "a_path_to_big_folder" [ link-info dup symbolic-link? [ drop ] [ size>>
>> + ] if  ] each-file
>>
>>
>> Our big-folder is on a netapp server shared by tens of people. Many small
>> files get updated
>> every minutes if not seconds. The update may involve removing the file
>> first.
>> It has many many subfolders which in turn have more subfolders.
>> Each subfolder may have hundreds of files (occasionally in the thousands).
>>
>> After a few day's discussion with factor guru's, I understand that
>> each-file traverses the directory structure by first putting
>> entries of a folder in a sequence. And it processes each entry one by one.
>> Although this may not cause using big chunk of memory at a time,
>> it does have the following issue..
>>
>> 
>>
>> Last night, I left the command running and came back this morning to find
>> that it failed with the message.
>> lstat:  "a path to a file" does not exist !!!
>>
>> This is because after 'each-file' puts the file into the sequence and
>> then when
>> it is its turn to be processed, it is not there at the time!!
>> Without error trapping, the above "0 ... each-file"  could not work in
>> our case.
>>
>> So, I guess I would need to do error-trapping on the word link-info.
>> I do not know how to do it.  Any hint ?
>>
>> Thanks
>> HP
>>
>>
>>
>>
>> --
>>
>> ___
>> Factor-talk mailing list
>> Factor-talk@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/factor-talk
>>
>>
>
>
> --
>
> 

Re: [Factor-talk] potential memory issue --- Fwd: how to error trapping 'link-info'

2015-10-01 Thread Doug Coleman
You can run your code in the leaks combinator and it will show you what
leaked. I suspect that you're just using a lot of memory though.

[ { 1 2 3 } [ malloc drop ] each ] leaks members .
{ ~malloc-ptr~ ~malloc-ptr~ ~malloc-ptr~ }

On Thu, Oct 1, 2015 at 12:31 PM, HP wei  wrote:

> Yes, I could find out a bit more about the memory issue.
>
> I tried it again this afternoon.  After 50 minutes into the action
>   0 "path" t [ link-info ... ] each-file
> the system 'top' shows RES rises above 1.2GB and %MEM becomes 15.7%
> and they continue to rise.
> It blacks out the gui window of factor.
>
> I try to hit Control-C but it continues to run.
> *** How to exit a running words ?
>
> It looks like the only natural way I know of to 'stop' it is to wait for
> link-info to hit the missing file scenario --- like the overnight run of
> last night.
>
> So, I just killed the factor session from the shell.  And missed the
> opportunity
> to inspect the memory usage in factor, as John suggested.
>
> Is there a way to exit running words ?
> [ perhaps, I need to learn to use a factor-debugger ? ]
>
> -
>
> Replying to John's questions about the disk layout:
>it is a disk with a tree of directories.
>directory count ~ 6000
>total number of files as of now ~ 1.1 million
>total number of softlinks ~ 57
>total file size ~ 70GB
>
>number of files in each sub-directory (not including the files in
> sub-directory inside it)
>range from a few hundreds to as high as of the order of <~10K.
>
>Some of the directories are constantly updated throughout the day.
>
> --HP
>
>
>
> On Thu, Oct 1, 2015 at 12:27 PM, John Benediktsson 
> wrote:
>
>> Maybe you can debug a little if you see that happen again?
>>
>> Perhaps something like this to get the largest number of instances, if
>> there is a per-file leak:
>>
>> IN: scratchpad all-instances [ class-of ] histogram-by
>> sort-values reverse 10 head .
>>
>> Some other words for inspecting memory:
>>
>> http://docs.factorcode.org/content/article-tools.memory.html
>>
>> Can you give us some information about your disk layout?
>>
>> Is it one big directory with 1 million files?  Is it a tree of
>> directories?  What do you think is average number of files per-directory?
>>
>> I opened a bug report if you'd like to provide feedback there rather than
>> the mailing list:
>>
>> https://github.com/slavapestov/factor/issues/1483
>>
>>
>>
>>
>> On Thu, Oct 1, 2015 at 8:38 AM, HP wei  wrote:
>>
>>> Well, I just checked the running factor session that failed the
>>> task overnight that I mentioned in below email.
>>>
>>> From the linux system command 'top',
>>> I see that this particular factor is using
>>> VIRT   4.0g
>>> RES   2.0g
>>> %MEM 26%
>>>
>>> I clicked on the restart listener button and the numbers remain the same.
>>> should I have done more to clean up the memory usage ?
>>>
>>> --
>>>
>>> For comparison, I killed the factor session and restart it from the
>>> shell.
>>> The numbers are
>>> VIRT  940M
>>> RES  182M
>>> %MEM 2.2%
>>>
>>> ==> Had the factor continued to run last night,
>>>it would have probably exhausted the memory on the machine.
>>>I guess there might be some memory (leak) issue somewhere ???
>>>
>>> --HP
>>>
>>>
>>>
>>> -- Forwarded message --
>>> From: HP wei 
>>> Date: Thu, Oct 1, 2015 at 9:36 AM
>>> Subject: how to error trapping 'link-info'
>>> To: factor-talk@lists.sourceforge.net
>>>
>>>
>>> As suggested by John, I test out the following action to
>>> get the total file sizes of a disk volume.
>>>
>>> 0 "a_path_to_big_folder" [ link-info dup symbolic-link? [ drop ] [
>>> size>> + ] if  ] each-file
>>>
>>>
>>> Our big-folder is on a netapp server shared by tens of people. Many
>>> small files get updated
>>> every minutes if not seconds. The update may involve removing the file
>>> first.
>>> It has many many subfolders which in turn have more subfolders.
>>> Each subfolder may have hundreds of files (occasionally in the
>>> thousands).
>>>
>>> After a few day's discussion with factor guru's, I understand that
>>> each-file traverses the directory structure by first putting
>>> entries of a folder in a sequence. And it processes each entry one by
>>> one.
>>> Although this may not cause using big chunk of memory at a time,
>>> it does have the following issue..
>>>
>>> 
>>>
>>> Last night, I left the command running and came back this morning to find
>>> that it failed with the message.
>>> lstat:  "a path to a file" does not exist !!!
>>>
>>> This is because after 'each-file' puts the file into the sequence and
>>> then when
>>> it is its turn to be processed, it is not there at the time!!
>>> Without error trapping, the above "0 ... each-file"  could not work in
>>> our case.
>>>
>>> So, I guess I would