Re: sync() in java

Jay Kreps Sun, 06 Nov 2011 08:10:06 -0800

Cool, I tried a similar test:
  https://gist.github.com/1343069


Here are the results I get for logging debug to stdout and redirecting to a
file:
  isDebugEnabled with logging = DEBUG: 9673.7 ns per invocation
  Debug helper with logging = DEBUG: 10248.6 ns per invocation
  isDebugEnabled with logging = INFO: 22.3 ns per invocation
  Debug helper with logging = INFO: 17.2 ns per invocation

isDebugEnabled is the explicit check and "debug helper" is the lazy value.
The important case is the logging = INFO case, as this is the case you are
trying to optimize with the isDebugEnabled check. So there really is no
difference in the debug helper case here, it looks like at least for this
microbenchmark the jvm can optimize it away. There does seem to be some
overhead for the helper method, as that seems to be persistently around
500ns slower, but I don't think that is a problem since the logging itself
is 10us.

Thanks for this pointer! I recommend we definitely move to this more
readable style. If you are willing to help with the conversion that would
of course be fantastic.

-Jay

On Sat, Nov 5, 2011 at 10:44 PM, Joe Stein <crypt...@gmail.com> wrote:

> So, for the possible new log style and if it is performance hindering I
> created some test code
>
> https://github.com/joestein/qetesh/blob/master/src/main/scala/logstyletest.scalato
> simulate how it would act
>
> can be run as a standalone jar just clone the project and then in qetesh
>
> java -jar target/scala-2.9.1/logstyletest_2.9.1-1.0.0.0-one-jar.jar 1000
> 100000 10
>
> first param is how many threads in the pool, second is how many pushes to
> the pool and third is how many times to repeat the test (since it will be
> bumping into things not related to the test that will slow it down since it
> is contending for resources).
>
> running on real hardware and ec2 both on large numbers and very small I do
> not see any overheard on the log style
>
> let me know if/how you want to proceed or any questions with the test put
> together or if you want something else in the test also covered.   I would
> be willing to sift through and make the changes for a patch if/when it gets
> some +1
>
> /*
> Joe Stein
> http://www.linkedin.com/in/charmalloc
> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> */
>
> On Sat, Nov 5, 2011 at 8:02 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
>
> > Ah, a couple interesting ideas there.
> >
> > For the process builder solution, I think it is too dangerous for this
> use
> > to pay off as an optimization. Java, being multithreaded by default,
> uses a
> > fork implementation that doesn't seem to use copy-on-write. This means
> that
> > if you have a 2GB process, you will do a 2GB allocation every time you
> > instantiate a process before execing your 40k sync binary. This can cause
> > really weird problems. We have been bitten by it in Hadoop and our job
> > scheduler, both of which do that extensively to work around the lack of
> > access to posix utilities. :-(
> >
> > Here is one of the bugs discussing this:
> > https://issues.apache.org/jira/browse/HADOOP-5059
> >
> > I like your approach to logging, and I think that because you have
> declared
> > the type for the log statements as msg: => String they are lazy, but what
> > is the overhead of that closure? I would fear it might outweigh the cost
> of
> > the string concatenation it replaces. I wonder if there is any
> benchmarking
> > on this? If it isn't too expensive, I would love for us to switch to that
> > style of logging.
> >
> > -Jay
> >
> > On Sat, Nov 5, 2011 at 2:57 PM, Joe Stein <crypt...@gmail.com> wrote:
> >
> > > Yeah, right good point that won't work for you.
> > >
> > > How about using ProcessBuilder?  I do this for calling shell scripts
> for
> > > whenever I can't do something in an API (or it is already implemented
> in
> > > bash scripts) and just need to execute something on the system.
> > >
> > > some object like this
> > >
> > >    private val reader = actor {
> > >        debug("created actor: " + Thread.currentThread)
> > >        var continue = true
> > >        loopWhile(continue){
> > >            reactWithin(WAIT_TIME) {
> > >                case TIMEOUT =>
> > >                    caller ! "react timeout"
> > >                case proc:Process =>
> > >                    debug("entering first actor " +
> Thread.currentThread)
> > >                    val streamReader = new
> > > java.io.InputStreamReader(proc.getInputStream)
> > >                    val bufferedReader = new
> > > java.io.BufferedReader(streamReader)
> > >                    val stringBuilder = new java.lang.StringBuilder()
> > >                    var line:String = null
> > >                    while({line = bufferedReader.readLine; line !=
> null}){
> > >                        stringBuilder.append(line)
> > >                        stringBuilder.append("\n")
> > >                    }
> > >                    bufferedReader.close
> > >                    caller ! stringBuilder.toString
> > >            }
> > >        }
> > >    }
> > >
> > >    def run(command: String, ran: (=> String)=>Unit) : Boolean = {
> > >        debug("gonna runa a command: " + Thread.currentThread + " =
> > > "+command)
> > >        val args = command.split(" ")
> > >        val processBuilder = new ProcessBuilder(args: _* )
> > >        processBuilder.redirectErrorStream(true)
> > >        val proc = processBuilder.start()
> > >
> > >        //Send the proc to the actor, to extract the console output.
> > >        reader ! proc
> > >
> > >        //Receive the console output from the actor.
> > >        receiveWithin(WAIT_TIME) {
> > >                case TIMEOUT => ran("receiving Timeout")
> > >                case result:String => ran(result)
> > >        }
> > >
> > >                true
> > >    }
> > >
> > >
> > > note my debug statement is a logging trait so as to not have to if
> > > (log.isDebugEnabled()) with by name param.  I was going to bring this
> up
> > > something I noticed while going through the code last night (e.g.
> > >
> > >
> >
> https://github.com/joestein/skeletor/blob/master/src/main/scala/util/Logging.scala
> > > )
> > > figure now is a good time.  Also don't have to ever do logger.getLogger
> > > because that comes with the trait.
> > >
> > > Maybe a generic utility and then you can just call and use it for
> sync()
> > at
> > > the OS level like you need?
> > >
> > > On Sat, Nov 5, 2011 at 5:26 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
> > >
> > > > But does that sync that file or all files, looking for the later...
> > > >
> > > > -jay
> > > >
> > > > Sent from my iPhone
> > > >
> > > > On Nov 5, 2011, at 1:44 PM, Joe Stein <crypt...@gmail.com> wrote:
> > > >
> > > > > On the fileoutputstream you can get the filedescriptor using
> getFD()
> > > and
> > > > then on that object you can sync()
> > > > >
> > > > > /*
> > > > > Joe Stein
> > > > > http://www.medialets.com
> > > > > Twitter: @allthingshadoop
> > > > > */
> > > > >
> > > > > On Nov 5, 2011, at 4:34 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
> > > > >
> > > > >> Does anyone know if there is an equivalent to the sync() system
> call
> > > > >> available in Java? This is the system call that flushes all files.
> > > This
> > > > >> seems like it might be a good optimization for the time-based log
> > > > flush. If
> > > > >> you are going to sequentially flush all the log partitions anyway
> it
> > > > might
> > > > >> be better to just do a single sync() and let the I/O scheduler
> have
> > > more
> > > > >> leeway in the ordering of he writes. I know who to get the
> > equivalent
> > > of
> > > > >> fsync() or fdatasync() using FileChannel.force(...) to flush a
> > single
> > > > file
> > > > >> but I don't know how to get the equivalent of sync().
> > > > >>
> > > > >> -Jay
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > /*
> > > Joe Stein
> > > http://www.linkedin.com/in/charmalloc
> > > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > > */
> > >
> >
>

Re: sync() in java

Reply via email to