Re: sync() in java

Jay Kreps Sat, 05 Nov 2011 17:02:38 -0700

Ah, a couple interesting ideas there.

For the process builder solution, I think it is too dangerous for this use
to pay off as an optimization. Java, being multithreaded by default, uses a
fork implementation that doesn't seem to use copy-on-write. This means that
if you have a 2GB process, you will do a 2GB allocation every time you
instantiate a process before execing your 40k sync binary. This can cause
really weird problems. We have been bitten by it in Hadoop and our job
scheduler, both of which do that extensively to work around the lack of
access to posix utilities. :-(


Here is one of the bugs discussing this:
https://issues.apache.org/jira/browse/HADOOP-5059

I like your approach to logging, and I think that because you have declared
the type for the log statements as msg: => String they are lazy, but what
is the overhead of that closure? I would fear it might outweigh the cost of
the string concatenation it replaces. I wonder if there is any benchmarking
on this? If it isn't too expensive, I would love for us to switch to that
style of logging.

-Jay

On Sat, Nov 5, 2011 at 2:57 PM, Joe Stein <crypt...@gmail.com> wrote:

> Yeah, right good point that won't work for you.
>
> How about using ProcessBuilder?  I do this for calling shell scripts for
> whenever I can't do something in an API (or it is already implemented in
> bash scripts) and just need to execute something on the system.
>
> some object like this
>
>    private val reader = actor {
>        debug("created actor: " + Thread.currentThread)
>        var continue = true
>        loopWhile(continue){
>            reactWithin(WAIT_TIME) {
>                case TIMEOUT =>
>                    caller ! "react timeout"
>                case proc:Process =>
>                    debug("entering first actor " + Thread.currentThread)
>                    val streamReader = new
> java.io.InputStreamReader(proc.getInputStream)
>                    val bufferedReader = new
> java.io.BufferedReader(streamReader)
>                    val stringBuilder = new java.lang.StringBuilder()
>                    var line:String = null
>                    while({line = bufferedReader.readLine; line != null}){
>                        stringBuilder.append(line)
>                        stringBuilder.append("\n")
>                    }
>                    bufferedReader.close
>                    caller ! stringBuilder.toString
>            }
>        }
>    }
>
>    def run(command: String, ran: (=> String)=>Unit) : Boolean = {
>        debug("gonna runa a command: " + Thread.currentThread + " =
> "+command)
>        val args = command.split(" ")
>        val processBuilder = new ProcessBuilder(args: _* )
>        processBuilder.redirectErrorStream(true)
>        val proc = processBuilder.start()
>
>        //Send the proc to the actor, to extract the console output.
>        reader ! proc
>
>        //Receive the console output from the actor.
>        receiveWithin(WAIT_TIME) {
>                case TIMEOUT => ran("receiving Timeout")
>                case result:String => ran(result)
>        }
>
>                true
>    }
>
>
> note my debug statement is a logging trait so as to not have to if
> (log.isDebugEnabled()) with by name param.  I was going to bring this up
> something I noticed while going through the code last night (e.g.
>
> https://github.com/joestein/skeletor/blob/master/src/main/scala/util/Logging.scala
> )
> figure now is a good time.  Also don't have to ever do logger.getLogger
> because that comes with the trait.
>
> Maybe a generic utility and then you can just call and use it for sync() at
> the OS level like you need?
>
> On Sat, Nov 5, 2011 at 5:26 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
>
> > But does that sync that file or all files, looking for the later...
> >
> > -jay
> >
> > Sent from my iPhone
> >
> > On Nov 5, 2011, at 1:44 PM, Joe Stein <crypt...@gmail.com> wrote:
> >
> > > On the fileoutputstream you can get the filedescriptor using getFD()
> and
> > then on that object you can sync()
> > >
> > > /*
> > > Joe Stein
> > > http://www.medialets.com
> > > Twitter: @allthingshadoop
> > > */
> > >
> > > On Nov 5, 2011, at 4:34 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
> > >
> > >> Does anyone know if there is an equivalent to the sync() system call
> > >> available in Java? This is the system call that flushes all files.
> This
> > >> seems like it might be a good optimization for the time-based log
> > flush. If
> > >> you are going to sequentially flush all the log partitions anyway it
> > might
> > >> be better to just do a single sync() and let the I/O scheduler have
> more
> > >> leeway in the ordering of he writes. I know who to get the equivalent
> of
> > >> fsync() or fdatasync() using FileChannel.force(...) to flush a single
> > file
> > >> but I don't know how to get the equivalent of sync().
> > >>
> > >> -Jay
> >
>
>
>
> --
>
> /*
> Joe Stein
> http://www.linkedin.com/in/charmalloc
> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> */
>

Re: sync() in java

Reply via email to